Deep Recommender Systems - PAPIs.io LATAM 2018

Deep Recommender
Systems
Gabriel Moreira • @gspmoreira
LatAm 2018
Lead Data Scientist Phd. Candidate

Introduction
"We are leaving the Information Age and entering the
Recommendation Age.".
Cris Anderson, "The long tail"
2

Introduction
37% of sales
2/3 watched movies
38% of top news
visualization
Recommender Systems are responsible for:
3

Collaborative Filtering based on Matrix Factorization

Why Deep Learning has a potential for RecSys?
● Feature extraction directly from the content (e.g., image,
text, audio)
● Heterogenous data handled easily
● Dynamic behaviour modeling with RNNs
● More accurate representation learning of users and items
○ Natural extensions of CF
● RecSys is a complex domain
○ Deep learning worked well in other complex domains

The Deep Learning era of RecSys
2007
2015
2016
2017-2018
Deep Boltzmann Machines
for rating prediction
calm before the
storm
A few seminal papers
First DLRS workshop and
papers on RecSys, KDD,
SIGIR
Continued increase

Research directions in DL-RecSys
And their combinations...
Session-based recommendations with
RNNs
Feature Extraction directly from the content
Learning Item embeddings
Deep Collaborative Filtering

Item embeddings
● Embedding: a (learned) real value vector representing an entity
● Also known as Latent feature vector / (Latent) representation
● Similar entities’ embeddings are similar
● Use in recommenders:
○ Initialization of item representation in more advanced
algorithms
● Item-to-item recommendations
Learning Item Embeddings

Prod2Vec (Grbovic et. al, 2015)
● Based on word2vec and paragraph2vec
● Skip-gram model on products
○ Input: i-th product purchased by the user
○ Context: the other purchases of the user
● Learning user representation
○ Follows paragraph2vec
○ User embedding added as global context
○ Input: user + products purchased except for
the i-th
○ Target: i-th product purchased by the user
User embeddings for user to produce predictions
prod2vec skip-gram model
Learning Item Embeddings

Feature extraction from content
for Recommender Systems

Feature extraction from unstructured data
Images Text Audio/Music
● CNN
● 1D CNN
● RNNs
● Weighted word
embeddings
● CNN
● RNN

● Initializing
○ Obtain item representation based on metadata
○ Use this representation as initial item features
● Regularizing
○ Obtain metadata based representations
○ The interaction based representation should be close to the metadata
based (regularizing term to loss of this difference)
● Joining
○ o Have the item feature vector be a concatenation
■ Fixed metadata based part
■ Learned interaction based part
Content Features in Hybrid Recommenders

Wide & Deep Learning (Cheng et. al, 2016)
● Joint training of two models
○ Deep Neural Network - Focused in generalization
○ Linear Model - Focused in memorization
● Improved online performance
○ +2.9% deep over wide
○ +3.9% deep & wide over wide
Deep Collaborative Filtering

Outbrain Click Prediction - Kaggle competition
Dataset
● Sample of users page views
and clicks during
14 days on June, 2016
● 2 Billion page views
● 17 million click records
● 700 Million unique users
● 560 sites
Can you predict which recommended content each user will click?
18
Wide & Deep Model example

Outbrain Click Prediction - Data Model
Numerical
Spatial
Temporal
Categorical
Target

Source: https://github.com/gabrielspmoreira/kaggle_outbrain_click_prediction_google_cloud_ml_engine
DNNLinearCombinedClassifier estimator

Wide and Deep features

POC Results
Framework / Platform Model Mean Average Precision (MAP)
Vowpal Wabbit Linear 0.6751
Tensorflow / Google ML
Engine
Linear (Wide) 0.6779
Deep model 0.6674
Wide & Deep model 0.6765

Source: https://stuartcove.com/
Going even deeper...

News Recommender Systems
using Deep Learning

News Recommender Systems
News RS Challenges
● Sparse user profiling
(LI et al. , 2011) (LIN et al. , 2014) (PELÁEZ et al. , 2016)
● Fast growing number of items
(PELÁEZ et al. , 2016) (MOHALLICK; ÖZGÖBEK , 2017)
● Accelerated item’s value decay
(DAS et al. , 2007)
● Users’ preferences shift
(PELÁEZ et al. , 2016) (EPURE et al. , 2017)

To investigate, design, implement, and evaluate a
deep learning meta-architecture for news
recommendation, in order to improve the accuracy
of recommendations provided by news portals,
satisfying readers' dynamic information needs in
such a challenging recommendation scenario.
Research Objective

Conceptual model of factors affecting news relevance
News
relevance
Topics Entities Publisher
News static properties
Recency Popularity
News dynamic properties
News article
User
TimeLocation Device
User current context
Long-term
interests
Short-term
interests
Global factors
Season-
ality
User interests
Breaking
events
Popular
Topics
Referrer

Recency factor influence
Article age (days)
Clicks

Meta-Architecture Requirements
● RQ1 - to provide personalized news recommendations in extreme
cold-start scenarios, as most news are fresh and most users cannot be
identified
● RQ2 - to use Deep Learning to automatically learn news
representations from textual content and news metadata, minimizing
the need of manual feature engineering
● RQ3 - to leverage the user session information, as the sequence of
interacted news may indicate the user’s short-term preferences for
session-based recommendations
● RQ4 - to leverage user's past sessions information, when available, to
model long-term interests for session-aware recommendations

Meta-Architecture Requirements
● RQ5 - to leverage users’ contextual information as a rich data source,
in such information scarcity about the user
● RQ6 - to model explicitly contextual news properties – popularity and
recency – as those are important factors on news interest life cycle
● RQ7 - to support an increasing number of new items and users by
incremental model retraining (online learning), without the need to
retrain on the whole historical dataset
● RQ8 - to provide a modular structure for news recommendation,
allowing its modules to be instantiated by different and increasingly
advanced neural network architectures and methods

CHAMELEON Meta-Architecture for News RS
Article
Context
Article
Content
Embeddings
Article Content Representation (ACR)
Textual Features Representation (TFR)
Metadata Prediction (MP)
Category Tags Entities
Article Metadata Attributes
Next-Article Recommendation (NAR)
Time
Location
Device
When a news article is published...
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
Article
Content
Embedding
candidate next articles
(positive and neg.)
active article
Active
Sessions
When a user reads a news article...
Predicted Next-Article Embedding
Session Representation (SR)
Recommendations Ranking (RR)
User-Personalized Contextual Article Embedding
Recommended
articles
Contextual Article Representation (CAR)
Content word embeddings
New York is a multicultural city , ...Publisher
Metadata
Attributes
News Article
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributes
Article Content Embedding
Legend:
Word
Embeddings

CHAMELEON - ACR module
Article
Content
Embeddings
Article Content Representation (ACR)
Textual Features Representation (TFR)
Metadata Prediction (MP)
Category Tags Entities
Article Metadata Attributes
When a news article is published...
Content word embeddings
New York is a multicultural city , ...Publisher
Metadata
Attributes
News Article
Module Sub-Module EmbeddingInput Output Data repositoryAttributes
Article Content Embedding
Legend:
Word
Embeddings
T-SNE viz. of articles embeddings colored by
category, with similar articles highlighted
“Sports”
“Berlin”

Articles Embeddings Similarity Evaluation (e.g. “sports”)
Article Content (Title | Kicker | Description) Similarity
Kommentar: Generation Mut | Kommentar von Alexander Wölffing zur
Handball-Nationalmannschaft bei der EM | Der offensive Ansatz des Deutschen Handball-Bundes
um Bob Hanning zahlt sich aus. Alexander Wölffing, stellvertretender Chefredakteur und Leiter
Sports bei SPORT1,
kommentiert.
-
”Seahawks werden Brady bearbeiten” | NFL-Legende Hines Ward analysiert den Super Bowl | Hines
Ward favorisiert Seattle. Der frühere MVP des Endspiels sieht aber Vollmer und Co. sowie einen
X-Faktor als Chance der Patriots. Für SPORT1 analysiert er den Super Bowl.
0.970
Bald kein ”Hack-a-Drummond” mehr? | Foul als taktisches Mittel: NBA will ”hack-a-player”-Regel
¨ander | Andre rummond und alle schlechten Freiwerfer in der NBA wird es freuen: Die Liga will
Fouls als taktisches Mittel anders bestrafen, um den Sport nicht kaputt zu machen.
0.964
Nummer 2: Leno findet sich damit ab | Bernd Leno spricht im SPORT1- Interview uber Manuel
Neuer ¨ | Bayer Leverkusens Bernd Leno nimmt bei SPORT1 Stellung zur Torhuter-Situation im
DFB-Team sowie Vorbild ¨ Manuel Neuer - und spricht auch uber Fehler und Frust.
0.962
33

CHAMELEON - NAR module
Article
Context
Article
Content
Embeddings
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
(positive and neg.)
active article
Active
Sessions
Recommended
articles
Active user session
Module Sub-Module EmbeddingInput Output Data repositoryAttributesLegend:
Article
Content
Embedding
34

Article
Context
Article
Content
Embeddings
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
(positive and neg.)
active article
Active
Sessions
Recommended
articles
Active user session
Sessions in a batch
Article
Content
Embedding
35
I1,1
I1,2
I1,3
I1,4
I1,5
I2,1
I2,2
I3,1
I3,2
I3,3
Input
I1,1
I1,2
I1,3
I1,4
I1,2
I1,3
I1,4
I1,5
Expected Output (labels)

Article
Context
Article
Content
Embeddings
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
(positive and neg.)
active article
Active
Sessions
Recommended
articles
Active user session
Article
Content
Embedding
36
Negative Sampling strategy
Articles read by users overall n the last hour
Buffer
Articles read in other user sessions in the
batch
Next article read by user in his session
Samples

Article
Context
Article
Content
Embeddings
Time
Location
Device
User context
User interaction
past read articles
Popularity
Recency
Article context
Users Past
Sessions
(positive and neg.)
active article
Active
Sessions
Recommended
articles
Active user session
Article
Content
Embedding
37
Recommendations Ranking
(RR) sub-module
Eq. 7 - Loss function (HUANG et al., 2013)
Eq. 4 - Relevance Score of an item for a user session
Eq. 5 - Cosine similarity
Eq. 6 - Softmax over Relevance Score (HUANG et al., 2013)

CHAMELEON - NAR module loss function
Recommendation loss function implemented on TensorFlow

● Predict the next article user will read in a session
● Training with interactions up to an hour, and evaluate the predictions of
next-item clicks for the next hour sessions
● Ranking Metrics
○ Recall@3 - Scores when the actually clicked article is among the top-3
items in the recommended ranking list
○ NDCG@3 - Takes into account the predicted position of actually clicked
item (e.g. in position 1 it would score higher than on position 2)
● Evaluation Benchmarks
○ Popular Recent - Keeps a buffer of all clicks of the last hour, and return
the most popular articles on that time window
○ Co-occurrent - Recommends articles that are most commonly read
together on user sessions
○ Content-Based - Return articles whose content embeddings are the
most similar
CHAMELEON - Offline evaluation protocol

CHAMELEON - Preliminary Results
Recall@3 distribution over time

NDCG@3 distribution over time

Recall@3 distribution
60,5%
66,5%
57%

Recall@3 by hour

Recall@3 by day

● ACR module
○ Try different word embeddings (Word2Vec, GloVe)
○ Try different textual feature extractors (CNNs, RNNs) to generate
articles embeddings
● NAR module
○ Test different RNN architectures (LSTM, GRU)
○ Try different negative sampling strategies
○ Leverage users past sessions to initialize RNNs for new sessions
○ Evaluate on different (and larger) news datasets
○ Hyperparameters tuning
Next steps

https://www.slideshare.net/balazshidasi/deep-learning-in-recommender-systems-recsys-summer-school-2017
References

https://github.com/datadynamo/strata_ny_2017_recommender_tutorial
Resources

Resources
Deskdrop dataset shared on Kaggle
bit.ly/dataset4recsys
● 12 months logs
(Mar. 2016 - Feb. 2017)
● ~ 73k user interactions
● ~ 3k shared articles
in the platform

Gabriel Moreira
Lead Data Scientist
gabrielpm@ciandt.com
@gspmoreira
We are hiring!
http://bit.ly/ds4ciandt

Deep Recommender Systems - PAPIs.io LATAM 2018

More Related Content

What's hot

Similar to Deep Recommender Systems - PAPIs.io LATAM 2018

More from Gabriel Moreira

Recently uploaded

Deep Recommender Systems - PAPIs.io LATAM 2018