The document summarizes key topics from a recommender systems conference, including:
1. Many major companies like Netflix, Quora, and Amazon consider recommendations to be a core part of their user experience.
2. Adaptive and interactive recommendations were discussed, including how Netflix personalizes content rows based on a user's predicted mood.
3. Text modeling algorithms like word2vec were discussed for generating recommendations from content like tweets, search queries, or product descriptions.
This is part 1 of the tutorial Xavier and Deepak gave at Recsys 2016 this year. You can find the second part http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
This is part 1 of the tutorial Xavier and Deepak gave at Recsys 2016 this year. You can find the second part http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search.
In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search.
In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
survey slides for contextual bandit
main reference: Li Zhou. A Survey on Contextual Multi-armed Bandits. arXiv, 2015. (https://arxiv.org/abs/1508.03326)
Temporal Learning and Sequence Modeling for a Job Recommender SystemAnoop Kumar
Our approach to the job recommendation task for the
Recsys Challenge 2016. The main contribution of our work is
to combine temporal learning with sequence modeling to capture complex, temporal-related user-item activity patterns to improve job recommendation.
The ACM RecSys Challenge 2016 was focussing on the problem of job recommendations: given a user, return a ranked list of jobs that the user is likely to be interested in. More than 100 teams actively participated and submitted solutions. All the winning teams used an ensemble of recommender strategies (e.g. learning to rank approaches, matrix factorization techniques, etc.). More details: http://2016.recsyschallenge.com/
Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin
This slides describes our solution for the RecSys Challenge 2016. In the challenge, several datasets were provided from a social network for business XING. The goal of the competition was to use these data to predict job postings that a user will interact positively with (click, bookmark or reply). Our solution to this problem includes three different types of models: Factorization Machine, item-based collaborative filtering, and content-based topic model on tags. Thus, we combined collaborative and content-based approaches in our solution.
Our best submission, which was a blend of ten models, achieved 7th place in the challenge's final leaderboard with a score of 1677898.52. The approaches presented in this paper are general and scalable. Therefore they can be applied to another problem of this type.
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Our objective for the Netflix recommendation engine is to create a personalized experience for our members, making it easier for them to find a video to watch and enjoy. When a member logs on to the service, she/he may be in one or a combination of different watching modes: discovering a new content to watch, continuing to watch a partially-watched movie or a TV show she/he has been binging on, playing one of the contents she/he had put in her play list during an earlier session, etc. If, for example, we can reasonably predict when a member is more likely to be in the continuation mode, and which videos she/he is more likely to resume, it makes sense to place those videos in more prominent places of the home page. In this talk we focus on understanding the discovery vs. continuation behavior and explain how we have used machine learning to improve the member experience by learning a personalized balance between those two modes. As a case study, we focus on a recent change on the personalization of a row of recommendations called “Continue Watching,” which appears on the main page of the Netflix member homepage on the website and the app and currently drives a significant proportion of member streaming hours.
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
Outbrain is the world’s largest discovery platform, bringing personalized and relevant content to audiences while helping publishers understand their audiences through data.
Its recommender system is serving billions of content recommendations daily, based on millions of hourly user interactions.
Our predictive models span over a variety of supervised learning techniques, ranging from content-based recommenders, through behavioral models and all the way to collaborative techniques such as factorization machines. Agility and stability are crucial aspects of the system.
This talk will cover our journey towards solutions that would not compromise neither on scale nor on model complexity, and design a dynamic framework that shortens the cycle between research and production.
We will cover the different stages of the framework, including important take away lessons for data scientists as well as software engineers.
Sonya Liberman is leading a team of Machine Learning Engineers and Data Scientists building large-scale recommender systems for personalized content discovery @ Outbrain, serving tens of billions real-time recommendations a day.
Especially enjoys bringing theory to production and seeing how it affects the engagement of (many) users.
This invited talk was given at ILTechTalk Week, 2018 by Shaked Bar, a Teach Lead and Algorithms Engineer in the team.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk I'll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enable running complex models under difficult scale constrains and shorten the cycle between research and production.
Sonya Liberman leads the Personalization team @ Outbrain's Recommendations group, developing large-scale machine learning algorithms for Outbrain's content recommendations platform serving tens of billions real-time recommendations a day. She specializes in Information Retrieval, Machine Learning, and Computational Linguistics. Before joining Outbrain, she led the Research and Algorithms @ ConvertMedia (acquired by Taboola). She holds an MSc in Computer Science and a BSc in Computer Science and Computational Biology.
This invited talk was given at PyData Meetup, April 2019
https://www.meetup.com/PyData-Tel-Aviv/
Cape Town - Bioschemas workshop before the Bioinformatics Education Summit.
Explains schema.org, Bioschemas, TeSS Case study, and the tools and implementation techniques adopters can use
Learning Web Development with Ruby on Rails LaunchThiam Hock Ng
The slide deck for the first session of Singapore Rails Group (https://medium.com/singapore-rails-learning-group/about-singapore-rails-learning-group-65fffb3a43dd)
Slides for my full-day information architecture workshop. Will teach in Minneapolis, MN (November 12, 2012) and Toronto, ON (November 29, 2012) Details: http://rosenfeldmedia.com/workshops/
How to Optimize Your Drupal Site with Structured ContentAcquia
<p>With the advent of real-time marketing technologies and design methodologies like atomic design, web pages are no longer just “pages” – they are collections of modular, dynamic data that can be rearranged according to the context of the user.</p>
<p>To provide optimized user experiences, marketers and publishers need to enrich websites with additional structure (taxonomy and metadata). By adding metadata, content becomes machine-understandable, which leads to better interoperability, SEO, and accessibility.</p>
<p>Structured content is also one of the foundations of real-time personalization; By tagging and describing content with metadata, personalization engines like Acquia Lift can provide more relevant content to individual users.</p>
<p>In this webinar, we will discuss:</p>
<ul>
<li>How to further enrich your Drupal website with structure</li>
<li>Taxonomy best practices for dynamic content and how to configure auto-tagging in your Drupal site</li>
<li>How to leverage Microdata and the schema.org vocabulary to improve SEO through rich results</li>
<li>How to improve the social shareability of your content through the use of Twitter Cards and OpenGraph tags</li>
<li>Why Drupal 8 is the best CMS platform for managing structured content</li>
</ul>
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling. Then, we present a case of how we've combined those techniques to build Smart Canvas, a SaaS that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to a content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Short-Bio: Gabriel Moreira is a scientist passionate about solving problems with data. He is Head of Machine Learning at CI&T and Doctoral student at Instituto Tecnológico de Aeronáutica - ITA. where he has also got his Masters on Science. His current research interests are recommender systems and deep learning.
https://www.meetup.com/pt-BR/machine-learning-big-data-engenharia/events/239037949/
This is a tutorial about recommender system for CS410 @ UIUC. It summarize some good research paper about how user profile and tags can improve recommender systems.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. Highlights from Recommender Systems Conference
Boston, MA, USA, 15th-19th September 2016
Mindis Zickus, https://www.dunnhumby.com/
2. Topics
1. Everything is a recommendation at Netflix, Quora, Amazon,…
2. Adaptive and interactive recommendations
3. Text modelling algorithms for recommendations
4. Explore-exploit dilemma
5. Models to generate features: Ranking content in the news feed
at Facebook
6. Deep learning is disrupting recommenders
7. Models in production
8. Pin recommendation at Pinterest
9. Contextual Turn
10. Interesting Papers, slides, algorithms
8. Algorithmic recommendations support human designers at
StichFix designs personalized clothing:
1. FILL OUT YOUR STYLE PROFILE
Tell your personal stylist about your fit, size and
style preferences.
2. RECEIVE A FIX DELIVERY
Get 5 pieces of clothing delivered to your door.
3. KEEP WHAT YOU WANT
Only pay for what you keep. Returns are easy
and free.
http://www.slideshare.net/KatherineLivins/recsys-2016-talk-
feature-selection-for-human-recommenders-66187739
9. Story recommendation for journalists at Schibsted
Algorithms recommend news to the Journalists
Journalists can tune freshness
12. Netflix orders content rows in front page
according to predicted user’s mode of watching
Rows of intent
• Continuation: Resume a recently-watched TV/Movie
• List: Play a title previously added to My List
• Rewatch: Rewatch a title enjoyed in the past
• Discovery: Discover a new title to watch
http://www.slideshare.net/intotheminds/balancing-discovery-and-continuation-in-recommendation-hossein-taghavi-netflix?
• Ordering of movies in rows
• Thematic coherence, relevancy
• Personalized personalization – levels of diversity
• Adaptive, intent driven personalization
• Thumbnail Image is personalised
13. Model reorders
unseen rows based on
previous clicks
Graphical (Bayesian)
model with
Expectation –
Maximization
inference
Unseen rows are also reordered in real time base on
real time behaviour
14. https://www.amazon.com/stream
Recommended items are adaptively personalized and
diversified at Amazon Stream
Method:
(1) a Bayesian regression model for scoring the relevance of items while
leveraging uncertainty,
(2) submodular diversification framework that re-ranks the top scoring items
based on category
(3) personalized category preferences learned from the user’s behavior.
18. Content recommendation at RoverApp (ex. Flipora)
1. Define topic hierarchy (3000 topics) e.g.
Sports/Racing/Formula1
2. Define entities within topics: Schumacher,
Obama
3. Crawl web, get pages. Or use publishers
content.
4. Assign each incoming document to topics and
entity (sparse SVM)
5. Define user’s interest profile as topics and
entities consumed with some decay (15000
dimensional vector)
6. Find most similar docs for user to recommend
7. Get CTR, update recommendations on CTR
21. Many industry recommenders are based or benefit from text
information
Methods:
• Tweets
• Search queries
• SMS messages
• Conversations
• Product descriptors
Many of items has some text attributes or
can be solely defined by text
• Similarity (bag of words, TF/IDF)
• Topic discovery with unsupervised learning (LDA)
• Dynamics of topics
• Taxonomies or Knowledge Graphs of Topics
• Entities (Named entity recognition)
• Sentiment
• Sequence (word2vec)
• Embedding
• User interest’s mapping
• Web pages
• Stories
• Blogs
• News
• Q&A
• Reviews
22. Original word2vec: captures word’s sequential co-
occurrence patterns to predict sequence of words
• Creates neural embedding (latent factors) of a word by predicting other
words in his neighborhood in document.
• The final objective is not prediction but the word’s vector of weights in
hidden matrix
23. Word2vec extensions for product recommendations
Yahoo: Prod2vec: predict next product in purchase sequence
https://arxiv.org/pdf/1606.07154.pdf
Criteo: Meta-Prod2Vec: extends prod2vec by leveraging item meta data, can
be used for cold start problems
https://arxiv.org/pdf/1607.07326v1.pdf
Microsoft: Item2Vec: Predict other products in basket
https://arxiv.org/ftp/arxiv/papers/1603/1603.04259.pdf
24. You have to do “Embedding”
• Every cool data scientist does “Embedding” these days
• Embedding means transporting/mapping the item or user to another n-dimensional
space.
• Sparse to dense representation
• Reduces dimensionality
• Space can be clusters, latent factors, dimensions.
• Embedding methods can be clustering, PCA, LD matrix factorization, neural (e.g.
word2vec), deep learning
• Embedding can be hierarchical
• Distances between items in new space gives similarity.
• There might be many types of similarities (e.g. >20 at Facebook)
28. Explore – exploit dilemma for music recommendations at Pandora
• If uncertainty/variance about the item’s
relevancy is high the optimal strategy
sometimes is to explore - show high
uncertainty but lower relevancy items
to users - to get more information about
true item’s relevancy
• Challenge is how much to explore to
avoid WTF recommendations
29. Ticketmaster case study: contextual bandit approach towards
periodical personalized recommendations
http://delivery.acm.org/10.1145/2960000/2959139/p23-qin.pdf?
Background: Ticketmaster is interested in pushing periodical personalized recommendations to users, commonly
seen for many e-commerce companies today. In many cases, users are not motivated to visit websites or launch
apps to see online recommendations. Periodical “pushing” of relevant products such as weekly recommendation
emails, sms, and notifications, remind users of the products for making purchases and further exploration of online
content.
Challenge: How to refresh recommendations
Contextual bandits:
1. Show completely random recommendations during the first batch.
2. Use the resulting feedback data from the first batch to initially train the models.
3. Publish the models, and use them to serve recommendations for the second batch.
4. Use the resulting feedback data from the second batch to update the models.
5. Repeat (3) and (4) with subsequent batches.
Improvement: use hashing trick
http://engineering.richrelevance.com/personalization-contextual-bandits/
30. Filter bubble in modelling: users see and click what is
recommended by models, subsequently models learn from
interactions with previous model generated recommendations.
31. 5. Models to generate features: Ranking content in
the news feed at Facebook
http://conf.turi.com/lsrs16/wp-content/uploads/Komal_Kapoor_Ranking-and-Recommendation-for-Billions-of-Users.pptx
32. Feature Selection (BDTs)
• Prune to the most important features (~2K)
• Training time is proportional to number of examples * number of
features
• Under-sample negative examples (impressions, no action) to help with #
of examples
• Reduce noise and results in simpler trees
• Do this for each feed event type: train many forests
• Historical counts and propensity are some of the strongest
features
33. Model Training (Logistic regression)
• We need to react quickly and incorporate new content - use a
simple model
• Logistic regression is simple, fast and easy to distribute
• Treat the trees as feature transforms, each one turning the input
features into a set of categorical features, one per tree.
• Use logistic regression for online learning to quickly re-learn leaf
weights
F3
-0.1 0.3
0.2
F1
-0.5
0.2 -.05
F2
F3
Throw out boosted tree weights, use only transforms
Input: (F1, F2, F3)
Output (T1, T2) where T1 {Leaves of tree 1}
34. Stacking: Combined Tree + LR Model
• Main Advantage: Tree application is computationally resource intensive and slow
• Reuse click tree to predict likes, comments, etc.
• Only slightly more resource intensive than independent models; better prediction
performance – transfer learnings
~Thousands of
Raw features
Thousands of Tree Transforms
Sparse Boolean features + non-tree raw features
Like Comment Share Friend Outbound
Click
Follow HideClick
Click Like Comment Share Friend Outbound click Follow Hide
35. Other models + sparse features
• Train Neural nets to predict events
• Discard final layer, use final layer outputs as features
• Add sparse features such as text or content ID
Raw
Features
Forest
Raw
Features
Neural Network
Sparse features
Logistic Regression
Like Comment Share Hide Outbound
Click
Fan | Follow FriendClick
36. Facebook: Chain of probabilities to measure ultimate value
Recommendation
Impression
Recommendation
Conversion
Page Post
Impression
Page Post Engage
P (engagement | impression) = P(conversion | impression) * P(post impression | conversion) * P(engagement | post impression)
37. • Data freshness matters – simple models allows for online
learning and twitch response
• Feature generation is part of the modeling process
• Stacking
• Supports plugging-in new algorithms and features easily
• Works very well in practice
• Use skewed sampling to manage high data volumes
• Historical counters as features provides highly predictive
features, easy to update online
Learnings
39. Machine-learning requires feature engineering that transforms the raw data (such as the pixel values of an image or
transactions) into feature vector from which the machine learning subsystem could classify patterns in the input.
Deep-learning have multiple levels of representation, obtained by composing simple but non-linear modules that each
transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more
abstract level. With the composition of enough such transformations, very complex functions can be learned.
http://www.slideshare.net/kerveros99/deep-learning-for-recommender-systems-budapest-recsys-meetup
https://www.yammer.com/dunnhumby.com/#/uploaded_files/69393183?threadId=775785880
40. Many companies try to use DL in production. Last year there were 0 deep learning papers at
Recsys, this year ~25% DL applications
• DL Pros: can deal with different types of input data (raw data, text, images, sequences) , can handle
cold start
• DL Cons: black box, many parameters to tune e.g. need another modelling system for tuning
• Instead of feature engineering, we now have architecture engineering
DL Papers at recsys
• Convolutional Matrix Factorization for Document Context-Aware Recommendation by Donghyun Kim, Chanyoung Park,
Jinoh Oh, Sungyong Lee, Hwanjo Yu
• Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations by Balázs Hidasi,
Massimo Quadrana, Alexandros Karatzoglou, Domonkos Tikk
Materials of DL workshop at Recsys
http://dlrs-workshop.org/dlrs-2016/program/
41. Google uses DL for Youtube recommendations,
DL still uses features defined by experts.
Mentioned that Google expects to move all
modelling to common platform based on
Tensorflow
https://static.googleusercontent.com/med
ia/research.google.com/en//pubs/archive/
45530.pdf
42. The artificial neurons (for example, hidden units grouped
under node s with values st at time t) get inputs from
other neurons at previous time steps (this is represented
with the black square, representing a delay of one time
step, on the left).
In this way, a recurrent neural network can map an input
sequence with elements xt into an output sequence with
elements ot, with each ot depending on all the previous
xtʹ (for tʹ ≤ t). The same parameters (matrices U,V,W )
are used at each time step.
Good article about abour DL and RNN
https://www.yammer.com/dunnhumby.com/#/uplo
aded_files/69393183?threadId=775785880
http://home.elka.pw.edu.pl/~btwardow/
recsys2016_btwardow_ACCEPTED.pdf
RNN
43. DL to combine usage and item’s text information in single model
• https://arxiv.org/abs/1609.02116
46. Model Accuracy vs
• Speed and complexity of scoring
• Transparency
• Cost of training and deriving features
• Ability to explain recommendations to user
• Causal effects
• Predicting the right metrics
48. Quora’s production machine learning uses Luigi to run model training workflows
Models are trained on single machine
49. Feature generation framework at Netflix
When experimenters design new feature encoders —
functions that take raw data as input and compute
features — they can immediately use them to compute
new features for any time in the past, since the time
machine can retrieve the appropriate snapshots and pass
them to the feature encoders.
http://techblog.netflix.com/2016/02/distributed-time-travel-
for-feature.html
50. Everyone uses two stage scoring!!!!
Stage1: Candidate retrieval, aim for high recall, get thousands of item
candidates
Stage2: Reranking based on more sophisticated models, real time
context, user’s feedback
51.
52. 2 stages of item ranking at eBay
1) Recall, which requires retrieving candidate items
that might be similar to the given seed item,
2) Ranking, which sorts the candidates according to
their probability of being purchased.
The input to the algorithm comes as an HTTP request
to the merchandising backend (MBE) system with a
given seed item. This initiates parallel calls to several
services which return candidate recommendations
that are similar in some way to the seed. The set of
candidate recommendations are then ranked in real
time. The output of the system is the top 5 ranked
items, which are surfaced to the user.
53. Netflix has shown that unless your dataset is huge, distributed model training is not faster
than training with well optimized code on single machine
http://www.slideshare.net/moustaki/some-pitfalls-of-distributed-learning
54. Argument for Scala to bridge data science and
production engineers
Some companies (Verizon, Asos, Credit Karma) are adopting Scala as
universal data analysis and analysis production language.
Why Scala:
• Functional language, can write data transformation pipelines
• Can use Java libraries
• Spark is in Scala
Similar to “continuous integration” movement to integrate software
development and operations.
55.
56.
57.
58.
59. ● Both ML engineers and data scientists are involved in machine
learning
● ML engineers build, implement, and maintain production
machine learning systems.
● Data scientists conduct research to generate ideas about
machine learning projects, and perform analysis to understand
the metrics impact of machine learning systems.
Data Science ways of working at Quora
https://www.quora.com/What-is-the-difference-between-a-machine-learning-engineer-and-a-
data-scientist-at-Quora
61. Related Pins System at Pinterest
1: Candidate Generation
• Signals derived from curation,
visuals similarity, topic vectors,
etc,
• Rough estimate of what is
“related”
• Generate N candidates
(thousands)
2. Ranking
• Machine –learned ranking
model applied to candidate set
3. Serving
• Online real time ranking and
serving
https://arxiv.org/pdf/1511.04003.pdf
62.
63.
64. Pinterest: To avoid filter bubble, serves small group of users
random Pins and uses that data to build models
65. Pinterest: real time ranking done with random forest, with
parallelized distributed c++ implementation of RF scoring
70. Contextual recommendations
• Recommendations don’t have to personal
• Majority of recommenders used in industry are item-item (non
personalized)
• Increasing number of session based recommenders
• When searching for new item it’s more important what other users did in
this situation vs. what user did previously himself
https://home.deib.polimi.it/pagano/portfolio/papers/TheContextualTurn.pdf
71. Importance of Personalization
• Value of personalization depends on how broad is your intent.
• The broader intent the more opportunity for personalization.
• “Running shoes” can be personalized if we know gender
• Personalization as re-ranking with user as context.
76. At Quora, the value of showing a story to a user is approximated
by weighted sum of actions
77. Event Probability Value*
Click 5.1% 1
Like 2.9% 5
Comment 0.55% 20
Share 0.00005% 40
Friend 0.00003% 50
Hide 0.00002% -100
Total 0.306
Multi-objective recommendations
At Facebook different actions have different
significance
Given a potential story, how good is it?
Express as probability of click, like, comment, etc.
Assign different weights to different events,
according to significance
79. Best paper of recsys: Local Item-Item Models For
Top-N Recommendation
• Original SLIM model: Item to
item similarity weights can be
learn by regressing purchase
indicator of every item rj (0/1)
by other items that have been
purchased by users.
• Improved SLIM model: By
using different item-item
models for these user subsets,
we can capture differences in
their preferences and this can
lead to improved performance
for top-N recommendations.
80. Extracting Food Substitutes
From Food Diary via
Distributional Similarity
• Foods that are consumed in
similar contexts are more
likely to be similar dietarily.
• For example, a turkey
sandwich can be considered
a suitable substitute for a
chicken sandwich if both
tend to be consumed with
french fries and salad.
81. List of algorithms used by presenters
• Logistic regression, Bayesian priors, caching, L1, L2, VW with FTRL
• GBDT, XGBOOST
• RankLib
• MF, LIBFM, field aware FM
• LDA (collapsed Gibbs sampling)
• Deep learning: RNN, CNN
• Word2vec: prod2vec, item2vec
• Graphical Bayesian models
83. LiRa: A New Likelihood-Based Similarity Score
For Collaborative Filtering
• https://arxiv.org/pdf/1608.08646v1.pdf
84. Submodality to mathematically control diversity
• Adding item from different cluster gives more value than from same
cluster
Adaptive,
Personalized
Diversity for
Visual Discovery
at Amazon
http://dl.acm.org/c
itation.cfm?id=29
59171
85. Negative sampling – is still an art
• Observational ata are implicit we know what user likes but don’t
• What user actually has seen or is aware of but intentionally hasn’t
clicked
• Popular not clicked items
• No single method, have to try what works