MEET
OUR
TEAM
WRITE HERE SOMETHING
About me - Jakub Macina @dmacjam
● MSc in Information Systems @ Slovak University of Technology
○ collaboration with edX
○ research paper at ACM RecSys 17
● open source contributor @ Discourse
○ Google Summer of Code
● Spine Hero - Microsoft Imagine Cup, Brilliant Young Entrepreneurs
● AI Engineer @ Exponea
2
MEET
OUR
TEAM
WRITE HERE SOMETHING1. Motivation for recommender systems
2. Collaborative filtering
3. Challenges
4. Text-based recommenders
○ Term weighting
○ Word2vec
5. Product embeddings
○ Usage
○ Training
○ Examples
6. Conclusions
Information overload
4
More than 400 hours of videos
are uploaded every minute
More than 35 million songs available
Information overload
5
Recommender system
● provide suggestions to users for items they might be interested to consume or
items meeting their needs
● more formally:
○ Estimate a utility function that automatically predicts how a user will like an item.
6
Recommendations are everywhere
7
•
•
•
•
•
•
•
•
•
•
•
Value of recommendation
“Our recommender system is used on most screens of the Netflix product beyond
the homepage, and in total influences choice for about 80% of hours
streamed at Netflix. The remaining 20% comes from search [...]”
Carlos A Gomez-Uribe and Neil Hunt. 2016. The netflix recommender system: Algorithms, business value, and innovation.
ACM Transactions on Management Information Systems (TMIS) 6, 4 (2016), 13.
9
Value of recommendation
60% of video clicks are from homepage recommendation
James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston,
and Dasarathi Sampath. 2010. The YouTube Video Recommendation System. In Proceedings of the Fourth ACM Conference on Recommender
Systems (RecSys ’10). ACM, New York, NY, USA, 293–296.
10
Collaborative filtering
● Based on user’s past behaviour
11
Image by Erik Bernhardsson
Challenges
● Customers are not logged in while browsing a website - no history available
12
Dataset Users Items Matrix density
Movielens 10M 69 878 10 681 1.340%
Average fashion e-commerce 100 000 - 500 000 1000 - 50 000 0.012% - 0.155 %
Challenges
● Customers are not logged in while browsing a website - no history available
● Buying intent and preferences might change from visit to visit
13
Dataset Users Items Matrix density
Movielens 10M 69 878 10 681 1.340%
Average fashion e-commerce 100 000 - 500 000 1000 - 50 000 0.012% - 0.155 %
MEET
OUR
TEAM
WRITE HERE SOMETHING
1. Content-based recommendation
● Find similar items by analyzing content (texts, images, music, ...)
● Text analysis
15
1. Content-based recommendation
● Project each product into low-dimensional space
● Compute similarities between them
16
Text preprocessing
● Stopwords removal
● Part-of-speech (POS) tagging
17
from nltk.corpus import stopwords
review = "Great local atmosphere, tasty tapas and great selection of beers."
words = review.lower().split(" ")
print([word for word in words if word not in (stopwords.words('english'))])
>> ['great', 'local', 'atmosphere,', 'tasty', 'tapas', 'great', 'selection', 'beers.']
for sent in nltk.sent_tokenize(review):
print(list(nltk.pos_tag(nltk.word_tokenize(sent))))
>> [('Great', 'NNP'), ('local', 'JJ'), ('atmosphere', 'NN'), (',', ','), ('tasty',
'JJ'), ('tapas', 'NN'), ('and', 'CC'), ('great', 'JJ'), ('selection', 'NN'), ('of',
'IN'), ('beers', 'NNS')]
Term weighting
● Weight = importance indicator of a term regarding content
18
0 2 1 1 0 ... 1
local
great
tapas
“Great local atmosphere, tasty tapas
and great beer selection.”
beer
blue
wine
|V|
Term weighting
● comparing exact words in document
● ignore order of words
● what about synonyms and related words?
19
“Great selection of local beers.” Wide variety of beers from all around the
world.
Word embeddings
20
Word embeddings
● Distributional hypothesis:
“You shall know a word by the company it keeps” (J. R. Firth 1957)
21
Word embeddings
● capture similarity between words, analogies, general syntactic and semantic
information
● unsupervised learning
● representing each word as a numeric vector = embedding
○ dense vectors - size is usually from 100 to 300
22
|100|
YOUNG, Tom, et al. Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709, 2017.
Word2vec
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural
information processing systems. 2013.
23
We
tried
food
yesterday
Italian Italian
We
tried
food
yesterday
Word2vec with gensim
24
print(reviews[:3])
>>> [
['this', 'place', 'is', 'horrible'],
['i', 'was', 'impressed', 'there'],
['i', 'decided', 'to', 'try', 'it', 'turned', 'out', 'is' , 'cheap', 'eat']
]
from gensim.models import Word2Vec
word2vec_model = Word2Vec(reviews, sg=1, iter=10, size=100, window=5, min_count=2,
workers=4)
word2vec_model.wv['impressed']
>> array([ 0.2790776 , -0.3456704 , 0.23330563, ..., -0.11152197],
dtype=float32)
Word2vec example
● Yelp Open Dataset - https://www.yelp.com/dataset
○ Business with reviews
○ Personal, educational, and academic purposes
25
Word2vec exploration
● http://projector.tensorflow.org/
https://radimrehurek.com/gensim/scripts/word2vec2tensor.html
26
Examples - similar terms
27
Examples - word analogies
Breakfast + lunch =
Wines - french + belgian =
28
Word embeddings
● Word2vec
○ Google News (about 100 billion words)
○ https://code.google.com/archive/p/word2vec/
● Fasttext
○ Character embeddings - deal with unknown word issue
○ Wikipedia
○ https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
29
Word embeddings
● Works great with texts
● Able to capture synonyms, words used in the same context
● Find similar products to a product
● Not too personalized
● Can we use this idea to provide more personalized recommendations?
30
Green sweater Ma&Mi Red sweater Ma&Mi Khaki sweater Originals
MEET
OUR
TEAM
WRITE HERE SOMETHING
2. Product embeddings
● represent each product as a numeric vector
● products in similar contexts to have similar vectors
32
-0.6 0.1 0.3 0.6 -0.3 ... 0.7
|100|
Usage
● Calculate similarity between any products
33
0.6 2.1 1.4 0.1 4.2 ... 3.3
0.4 1.7 0.7 0.3 5.6 ... 2.1
Cosine similarity = 0.823
In-session personalization using product embeddings
34
Last products viewed
in a session
Combined product vector
0.86
0.24
0.61
Training - product views
● Views by users ordered in time
35
= 5846 8743 9635 8745
= 8011 1239 2310
= 3324 9803
= 6798 7129 5989
Neural network architecture
36
Clustering of products
37
WomenMen
Casual
Formal
(-0.4, 0.7)
(0.8, 0.6)
38
39
40
41
42
Model tuning
● filter out short clicks (accidental, not interesting)
● negative sampling - random sampling by default - use products from the same
category
43
Comparison with collaborative filtering
44BARKAN, Oren; KOENIGSTEIN, Noam. Item2vec: neural item embedding for collaborative filtering. In: Machine Learning for Signal Processing
(MLSP), 2016 IEEE 26th International Workshop on. IEEE, 2016. p. 1-6.
Recommendation using product embeddings
● utilize session data about customers browse through your website
● represent each product as a dense numeric vector / embedding
● real-time - retrieve, combine and compute similarities
● able to capture style of a product, color, category or a price level
● contact me:
○ jakub.macina@exponea.com
○ @dmacjam
45
Resources
● https://www.slideshare.net/xamat/recommender-systems-machine-learning-su
mmer-school-2014-cmu
● BARKAN, Oren; KOENIGSTEIN, Noam. Item2vec: neural item embedding for
collaborative filtering. In: Machine Learning for Signal Processing (MLSP),
2016 IEEE 26th International Workshop on. IEEE, 2016. p. 1-6.
● YOUNG, Tom, et al. Recent trends in deep learning based natural language
processing. arXiv preprint arXiv:1708.02709, 2017.
46

Real-time personalized recommendations using product embeddings

  • 1.
  • 2.
    About me -Jakub Macina @dmacjam ● MSc in Information Systems @ Slovak University of Technology ○ collaboration with edX ○ research paper at ACM RecSys 17 ● open source contributor @ Discourse ○ Google Summer of Code ● Spine Hero - Microsoft Imagine Cup, Brilliant Young Entrepreneurs ● AI Engineer @ Exponea 2
  • 3.
    MEET OUR TEAM WRITE HERE SOMETHING1.Motivation for recommender systems 2. Collaborative filtering 3. Challenges 4. Text-based recommenders ○ Term weighting ○ Word2vec 5. Product embeddings ○ Usage ○ Training ○ Examples 6. Conclusions
  • 4.
    Information overload 4 More than400 hours of videos are uploaded every minute More than 35 million songs available
  • 5.
  • 6.
    Recommender system ● providesuggestions to users for items they might be interested to consume or items meeting their needs ● more formally: ○ Estimate a utility function that automatically predicts how a user will like an item. 6
  • 7.
  • 8.
  • 9.
    Value of recommendation “Ourrecommender system is used on most screens of the Netflix product beyond the homepage, and in total influences choice for about 80% of hours streamed at Netflix. The remaining 20% comes from search [...]” Carlos A Gomez-Uribe and Neil Hunt. 2016. The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS) 6, 4 (2016), 13. 9
  • 10.
    Value of recommendation 60%of video clicks are from homepage recommendation James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube Video Recommendation System. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys ’10). ACM, New York, NY, USA, 293–296. 10
  • 11.
    Collaborative filtering ● Basedon user’s past behaviour 11 Image by Erik Bernhardsson
  • 12.
    Challenges ● Customers arenot logged in while browsing a website - no history available 12 Dataset Users Items Matrix density Movielens 10M 69 878 10 681 1.340% Average fashion e-commerce 100 000 - 500 000 1000 - 50 000 0.012% - 0.155 %
  • 13.
    Challenges ● Customers arenot logged in while browsing a website - no history available ● Buying intent and preferences might change from visit to visit 13 Dataset Users Items Matrix density Movielens 10M 69 878 10 681 1.340% Average fashion e-commerce 100 000 - 500 000 1000 - 50 000 0.012% - 0.155 %
  • 14.
  • 15.
    1. Content-based recommendation ●Find similar items by analyzing content (texts, images, music, ...) ● Text analysis 15
  • 16.
    1. Content-based recommendation ●Project each product into low-dimensional space ● Compute similarities between them 16
  • 17.
    Text preprocessing ● Stopwordsremoval ● Part-of-speech (POS) tagging 17 from nltk.corpus import stopwords review = "Great local atmosphere, tasty tapas and great selection of beers." words = review.lower().split(" ") print([word for word in words if word not in (stopwords.words('english'))]) >> ['great', 'local', 'atmosphere,', 'tasty', 'tapas', 'great', 'selection', 'beers.'] for sent in nltk.sent_tokenize(review): print(list(nltk.pos_tag(nltk.word_tokenize(sent)))) >> [('Great', 'NNP'), ('local', 'JJ'), ('atmosphere', 'NN'), (',', ','), ('tasty', 'JJ'), ('tapas', 'NN'), ('and', 'CC'), ('great', 'JJ'), ('selection', 'NN'), ('of', 'IN'), ('beers', 'NNS')]
  • 18.
    Term weighting ● Weight= importance indicator of a term regarding content 18 0 2 1 1 0 ... 1 local great tapas “Great local atmosphere, tasty tapas and great beer selection.” beer blue wine |V|
  • 19.
    Term weighting ● comparingexact words in document ● ignore order of words ● what about synonyms and related words? 19 “Great selection of local beers.” Wide variety of beers from all around the world.
  • 20.
  • 21.
    Word embeddings ● Distributionalhypothesis: “You shall know a word by the company it keeps” (J. R. Firth 1957) 21
  • 22.
    Word embeddings ● capturesimilarity between words, analogies, general syntactic and semantic information ● unsupervised learning ● representing each word as a numeric vector = embedding ○ dense vectors - size is usually from 100 to 300 22 |100| YOUNG, Tom, et al. Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709, 2017.
  • 23.
    Word2vec Mikolov, Tomas, etal. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. 23 We tried food yesterday Italian Italian We tried food yesterday
  • 24.
    Word2vec with gensim 24 print(reviews[:3]) >>>[ ['this', 'place', 'is', 'horrible'], ['i', 'was', 'impressed', 'there'], ['i', 'decided', 'to', 'try', 'it', 'turned', 'out', 'is' , 'cheap', 'eat'] ] from gensim.models import Word2Vec word2vec_model = Word2Vec(reviews, sg=1, iter=10, size=100, window=5, min_count=2, workers=4) word2vec_model.wv['impressed'] >> array([ 0.2790776 , -0.3456704 , 0.23330563, ..., -0.11152197], dtype=float32)
  • 25.
    Word2vec example ● YelpOpen Dataset - https://www.yelp.com/dataset ○ Business with reviews ○ Personal, educational, and academic purposes 25
  • 26.
  • 27.
  • 28.
    Examples - wordanalogies Breakfast + lunch = Wines - french + belgian = 28
  • 29.
    Word embeddings ● Word2vec ○Google News (about 100 billion words) ○ https://code.google.com/archive/p/word2vec/ ● Fasttext ○ Character embeddings - deal with unknown word issue ○ Wikipedia ○ https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md 29
  • 30.
    Word embeddings ● Worksgreat with texts ● Able to capture synonyms, words used in the same context ● Find similar products to a product ● Not too personalized ● Can we use this idea to provide more personalized recommendations? 30 Green sweater Ma&Mi Red sweater Ma&Mi Khaki sweater Originals
  • 31.
  • 32.
    2. Product embeddings ●represent each product as a numeric vector ● products in similar contexts to have similar vectors 32 -0.6 0.1 0.3 0.6 -0.3 ... 0.7 |100|
  • 33.
    Usage ● Calculate similaritybetween any products 33 0.6 2.1 1.4 0.1 4.2 ... 3.3 0.4 1.7 0.7 0.3 5.6 ... 2.1 Cosine similarity = 0.823
  • 34.
    In-session personalization usingproduct embeddings 34 Last products viewed in a session Combined product vector 0.86 0.24 0.61
  • 35.
    Training - productviews ● Views by users ordered in time 35 = 5846 8743 9635 8745 = 8011 1239 2310 = 3324 9803 = 6798 7129 5989
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Model tuning ● filterout short clicks (accidental, not interesting) ● negative sampling - random sampling by default - use products from the same category 43
  • 44.
    Comparison with collaborativefiltering 44BARKAN, Oren; KOENIGSTEIN, Noam. Item2vec: neural item embedding for collaborative filtering. In: Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on. IEEE, 2016. p. 1-6.
  • 45.
    Recommendation using productembeddings ● utilize session data about customers browse through your website ● represent each product as a dense numeric vector / embedding ● real-time - retrieve, combine and compute similarities ● able to capture style of a product, color, category or a price level ● contact me: ○ jakub.macina@exponea.com ○ @dmacjam 45
  • 46.
    Resources ● https://www.slideshare.net/xamat/recommender-systems-machine-learning-su mmer-school-2014-cmu ● BARKAN,Oren; KOENIGSTEIN, Noam. Item2vec: neural item embedding for collaborative filtering. In: Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on. IEEE, 2016. p. 1-6. ● YOUNG, Tom, et al. Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709, 2017. 46