Nikolay Karpov
NRU Higher School of Economics, Nizhny Novgorod Russia
New Tang Dynasty Television ( ntd.tv ), NYC USA
Nanyang Technological University
School of Computer Science and Engineering
Singapore
News Recommender System:
From Math Model to Production Solution
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
BIO
• Associate Professor at the National Research University Higher
School of Economics (NRU HSE), Nizhny Novgorod, Russia,
where he leads the Artificial Intelligence Laboratory.
• Data Scientist at the New Tang Dynasty Television
(www.ntd.tv), NYC USA. Before he was Data Scientist at the
Neuron Ltd. (life.ru), Nizhny Novgorod, Russia
• Deputy Dean of the Faculty of Informatics, Mathematics, and
Computer Science at NRU HSE
• Academic Supervisor of «Applied Mathematics and
Information Science» and «Software Engineering»
undergraduate programmes at NRU HSE
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
Motivation
 Most websites have only implicit feedback
 State of the art implicit models (BRP, WARP) don't use
time, but time is important for news domain.
Our model inspired by TimeSVD++
 Cold start problem with new users and new articles
Need to use additional features for users and items
 Algorithm should support GPU to be fast
We observe several neural network models and
implement our model on TensorFlow
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
Matrix factorization model
I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7
U1 1 1
U2 1 1 1 x
U3 1
U4 1 =
U5 1
U6 1 1 1
U7 1
U8 1
U9
Matrix factorization model
U1 U2 U3 U4 U5 U6 U7 U8 U9
F1
F2
F3
I1 I2 I3 I4 I5 I6 I7
F1
F2
F3
)()()( yxxf I
T
Uy 
)(yI
)(xU
News dataset
 600-800 thousand unique users a day
 200-300 new articles a day
 2 - 3 millions of user interactions a day
 86% view in a first 24 hours after publishing
Model with time
 Ф_U – user matrix
 Ф_T – vector of average interest of users in a certain
period of time
 Ф_I – item matrix
 Ψ(y, t) – vector which reflects the average popularity of
item in a certain period of time
Model with time
U1 U2 U3 U4 U5 U6 U7 U8 U9
F1
F2
F3
I1 I2 I3 I4 I5 I6 I7
F1
F2
F3
)(xU
)(yI
)(tT
),( ty
Optimization methods for
Implicit Feedback
 Bayesian Personalized Ranking (BPR)
1
Pairwise sampling, optimize AUC
 Weighted Approximate-Rank Pairwise (WARP)
2
Optimize top items in a rank list
1. S.Rendle, C.Freudenthaler and Z.Gantner 2009
2. J. Weston, H. Yee, and R. J. Weiss 2013
Positive and negative samples
I1 I2 I3 I4 I5 I6 I7 Positive Negative
U1 1 1 U1 I2 U1 I4
U2 1 1 1 U2 I2 U2 I1
U3 1 U3 I1 U3 I6
U4 1 U1 I6 U1 I1
U5 1 U2 I5 U2 I3
U6 1 1 1 U2 I6 U2 I7
U7 1
U8 1
U9
)(ifu )( jfu
Algorithm
Results on News Dataset
Results on MovieLens 10M
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
Cold start with new user
I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7
U1 1 1
U2 1 1 1 x
U3 1
U4 1 =
U5 1
U6 1 1 1
U7 1
U8 1
U9
Additional user features
I
1
I
2
I
3
I
4
I
5
I
6
I
7
A
1
A
2
I
1
I
2
I
3
I
4
I
5
I
6
I
7
A
1
A
2
1 1 1
1 1 1 1 x
1 1
1 1 =
1 1
1 1 1 1
1 1
1 1
1
Cold start with new item
I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7
U1 1 1
U2 1 1 1 x
U3 1
U4 1 =
U5 1
U6 1 1 1
U7 1
U8 1
U9
Cold start with new item
I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7
U1 1 1
U2 1 1 1 x
U3 1
U4 1 =
U5 1
U6 1 1 1
U7 1
U8 1
U9
A1 1 1 1
A2 1 1 1 1
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
Deep Matrix Factorization
https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet
Neural network can
model important non-
linear combinations
of factors to make
better predictions.
Deep Matrix Factorization
I1 I2 I3 I4 I5 I6 I7
U1
U2
U3
N e t w o r k
U1 U2 U3 U4 U5 U6 U7 U8 U9 I1 I2 I3 I4 I5 I6 I7
Deep Matrix Factorization
U1 U2 U3 U4 U5 U6 U7 U8 U9
F1
F2
F3
A1
A2
I1 I2 I3 I4 I5 I6 I7
F1
F2
F3
A1
A2
A3
Because network can
model non-linear
combinations of factors
we can concatenate
factor matrixes with
additional parameters of
user and item
Deep Matrix Factorization
U1 U2 U3 U4 U5 U6 U7 U8 U9
F1
F2
F3
A1
A2
I1 I2 I3 I4 I5 I6 I7
F1
F2
F3
A1
A2
A3
Lookup layer
selects needed
columns from
matrixes
Deep Matrix Factorization
I3
U5
N e t w o r k
UA1 UA2 UA3 UF1 UF2 UF3 IA1 IA2 IF1 IF2 IF3
We concatenate
user and item
vectors and use it
as input for neural
network. Network
predicts one scalar
value.
For training
we need both
positive and
negative samples
Our data is a sequence of events
I1 I2 I3 I4 I5 I6 I7 Positive
U1 1 1 U1 I2
U2 1 1 1 U2 I2
U3 1 U3 I1
U4 1 U1 I6
U5 1 U2 I5
U6 1 1 1 U2 I6
U7 1
U8 1
U9
)(ifu
Network can do more
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Using recurrent network we can take into account timeline of events
LSTM based model
I?
U2
I2 I5 I6
U2 U2 U2
Recurrent Neural
Network (RNN) allows
to predict next user-
item interaction based
on previous ones
What About
Deployment?
Deployment
• Train model on powerful machine with GPU
• Use model for inference on another machine
• Provide API for inference
• Update model periodically
Technologies: TensorFlow, Docker, NGINX, Flask,
uWSGI
For the TensorFlow there is a TensorFlow Serving with
allows you to do it. But we use another approach which
suits not only for TensorFlow based models.
Data analysis cycle
Show
recommendations
to user
Count user
engagement
Predict user’s
interests
Recommendation engine
workflow
NTD
front-
end
Big Query
User history
AI engine
Google
Analytics
MySQL
NTD content
NTD
recom-
mended
list
TECHNICAL
IMPLEMENTATION
THANK YOU KANNAN
FOR THIS SECTION
Technical
Implementati
on- Initial
Setup
Technical
Implementati
on – How
services are
started
Technical
Implementati
on – How
user requests
are handled
Technical
Implementation
– How Model
Reloading
Works
System properties
• Proposed architecture allow to serve several hundred
of API calls per second.
• If you need more you can scale it horizontally by
replicating inference machine
• We take into account user and item features for cold
start
• After new user-item engagement you can update
inference fast without retraining whole model
LSTM based model
I?
U2
I2 I5 I6 I7
New user-item engagement
U2 U2 U2 U2
After new user-item
engagement you can
update inference fast
without retraining whole
model
Agenda
 BIO
 Motivation
 Matrix factorization model with time
 Model for cold start problem
 Recommender based on neural network
 Deployment
 Conclusion
Conclusion
 We have introduced a model for implicit feedback recommender
system.
 To consider temporal dynamics present in news domain we
successfully apply a heuristic to factorization model.
 To our factorization model, we implement WARP algorithm on
TensorFlow for loss and sampling procedure.
 This model was evaluated on our specific news dataset, which we
made available for public
 We solved cold start problem for new user and new item
 We implement sequential model for news recommender based
on LSTM
 We deploy our model to support several hundred inference calls
per second
Acknowledgment
• Young Faculty Development Programme of the
high-potential group «Future professoriate» in
the National Research University Higher School
of Economics for my internship support to NTU
• New Tang Dynasty Television (ntd.tv) for the
opportunity for be a part of the team, for data and
equipment.
• Kannan Sankaran for great full stack technical
support
Thank you for your attention!
Nikolay Karpov
Email to me:
nkarpov@hse.ru

News recommender system from math model to production solution

  • 1.
    Nikolay Karpov NRU HigherSchool of Economics, Nizhny Novgorod Russia New Tang Dynasty Television ( ntd.tv ), NYC USA Nanyang Technological University School of Computer Science and Engineering Singapore News Recommender System: From Math Model to Production Solution
  • 2.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 3.
    BIO • Associate Professorat the National Research University Higher School of Economics (NRU HSE), Nizhny Novgorod, Russia, where he leads the Artificial Intelligence Laboratory. • Data Scientist at the New Tang Dynasty Television (www.ntd.tv), NYC USA. Before he was Data Scientist at the Neuron Ltd. (life.ru), Nizhny Novgorod, Russia • Deputy Dean of the Faculty of Informatics, Mathematics, and Computer Science at NRU HSE • Academic Supervisor of «Applied Mathematics and Information Science» and «Software Engineering» undergraduate programmes at NRU HSE
  • 4.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 5.
    Motivation  Most websiteshave only implicit feedback  State of the art implicit models (BRP, WARP) don't use time, but time is important for news domain. Our model inspired by TimeSVD++  Cold start problem with new users and new articles Need to use additional features for users and items  Algorithm should support GPU to be fast We observe several neural network models and implement our model on TensorFlow
  • 6.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 7.
    Matrix factorization model I1I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 U1 1 1 U2 1 1 1 x U3 1 U4 1 = U5 1 U6 1 1 1 U7 1 U8 1 U9
  • 8.
    Matrix factorization model U1U2 U3 U4 U5 U6 U7 U8 U9 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 )()()( yxxf I T Uy  )(yI )(xU
  • 9.
    News dataset  600-800thousand unique users a day  200-300 new articles a day  2 - 3 millions of user interactions a day  86% view in a first 24 hours after publishing
  • 10.
    Model with time Ф_U – user matrix  Ф_T – vector of average interest of users in a certain period of time  Ф_I – item matrix  Ψ(y, t) – vector which reflects the average popularity of item in a certain period of time
  • 11.
    Model with time U1U2 U3 U4 U5 U6 U7 U8 U9 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 )(xU )(yI )(tT ),( ty
  • 12.
    Optimization methods for ImplicitFeedback  Bayesian Personalized Ranking (BPR) 1 Pairwise sampling, optimize AUC  Weighted Approximate-Rank Pairwise (WARP) 2 Optimize top items in a rank list 1. S.Rendle, C.Freudenthaler and Z.Gantner 2009 2. J. Weston, H. Yee, and R. J. Weiss 2013
  • 13.
    Positive and negativesamples I1 I2 I3 I4 I5 I6 I7 Positive Negative U1 1 1 U1 I2 U1 I4 U2 1 1 1 U2 I2 U2 I1 U3 1 U3 I1 U3 I6 U4 1 U1 I6 U1 I1 U5 1 U2 I5 U2 I3 U6 1 1 1 U2 I6 U2 I7 U7 1 U8 1 U9 )(ifu )( jfu
  • 14.
  • 15.
  • 16.
  • 17.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 18.
    Cold start withnew user I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 U1 1 1 U2 1 1 1 x U3 1 U4 1 = U5 1 U6 1 1 1 U7 1 U8 1 U9
  • 19.
  • 20.
    Cold start withnew item I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 U1 1 1 U2 1 1 1 x U3 1 U4 1 = U5 1 U6 1 1 1 U7 1 U8 1 U9
  • 21.
    Cold start withnew item I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 I1 I2 I3 I4 I5 I6 I7 U1 1 1 U2 1 1 1 x U3 1 U4 1 = U5 1 U6 1 1 1 U7 1 U8 1 U9 A1 1 1 1 A2 1 1 1 1
  • 22.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 23.
    Deep Matrix Factorization https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet Neuralnetwork can model important non- linear combinations of factors to make better predictions.
  • 24.
    Deep Matrix Factorization I1I2 I3 I4 I5 I6 I7 U1 U2 U3 N e t w o r k U1 U2 U3 U4 U5 U6 U7 U8 U9 I1 I2 I3 I4 I5 I6 I7
  • 25.
    Deep Matrix Factorization U1U2 U3 U4 U5 U6 U7 U8 U9 F1 F2 F3 A1 A2 I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 A1 A2 A3 Because network can model non-linear combinations of factors we can concatenate factor matrixes with additional parameters of user and item
  • 26.
    Deep Matrix Factorization U1U2 U3 U4 U5 U6 U7 U8 U9 F1 F2 F3 A1 A2 I1 I2 I3 I4 I5 I6 I7 F1 F2 F3 A1 A2 A3 Lookup layer selects needed columns from matrixes
  • 27.
    Deep Matrix Factorization I3 U5 Ne t w o r k UA1 UA2 UA3 UF1 UF2 UF3 IA1 IA2 IF1 IF2 IF3 We concatenate user and item vectors and use it as input for neural network. Network predicts one scalar value. For training we need both positive and negative samples
  • 28.
    Our data isa sequence of events I1 I2 I3 I4 I5 I6 I7 Positive U1 1 1 U1 I2 U2 1 1 1 U2 I2 U3 1 U3 I1 U4 1 U1 I6 U5 1 U2 I5 U6 1 1 1 U2 I6 U7 1 U8 1 U9 )(ifu
  • 29.
    Network can domore http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Using recurrent network we can take into account timeline of events
  • 30.
    LSTM based model I? U2 I2I5 I6 U2 U2 U2 Recurrent Neural Network (RNN) allows to predict next user- item interaction based on previous ones
  • 31.
  • 32.
    Deployment • Train modelon powerful machine with GPU • Use model for inference on another machine • Provide API for inference • Update model periodically Technologies: TensorFlow, Docker, NGINX, Flask, uWSGI For the TensorFlow there is a TensorFlow Serving with allows you to do it. But we use another approach which suits not only for TensorFlow based models.
  • 33.
    Data analysis cycle Show recommendations touser Count user engagement Predict user’s interests
  • 34.
    Recommendation engine workflow NTD front- end Big Query Userhistory AI engine Google Analytics MySQL NTD content NTD recom- mended list
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    System properties • Proposedarchitecture allow to serve several hundred of API calls per second. • If you need more you can scale it horizontally by replicating inference machine • We take into account user and item features for cold start • After new user-item engagement you can update inference fast without retraining whole model
  • 41.
    LSTM based model I? U2 I2I5 I6 I7 New user-item engagement U2 U2 U2 U2 After new user-item engagement you can update inference fast without retraining whole model
  • 42.
    Agenda  BIO  Motivation Matrix factorization model with time  Model for cold start problem  Recommender based on neural network  Deployment  Conclusion
  • 43.
    Conclusion  We haveintroduced a model for implicit feedback recommender system.  To consider temporal dynamics present in news domain we successfully apply a heuristic to factorization model.  To our factorization model, we implement WARP algorithm on TensorFlow for loss and sampling procedure.  This model was evaluated on our specific news dataset, which we made available for public  We solved cold start problem for new user and new item  We implement sequential model for news recommender based on LSTM  We deploy our model to support several hundred inference calls per second
  • 44.
    Acknowledgment • Young FacultyDevelopment Programme of the high-potential group «Future professoriate» in the National Research University Higher School of Economics for my internship support to NTU • New Tang Dynasty Television (ntd.tv) for the opportunity for be a part of the team, for data and equipment. • Kannan Sankaran for great full stack technical support
  • 45.
    Thank you foryour attention! Nikolay Karpov Email to me: nkarpov@hse.ru