1. Zero Shot Recommenders,
LLMs and Prompt Engineering
PRS Workshop, Net
fl
ix, 2023
June 9th, 2023
Hao Ding (haodin haoding2019 ) and Anoop Deoras (adeoras )
AWS AI, Amazon
1
Towards Building Foundation Models in Recommender Systems
2. Our Mission at AWS
Put Machine Learning in the Hands of Every Developer
2
3. The AWS ML Stack
Broadest and Most Complete Set of ML Capabilities
GenAI
NEW
Bedrock
CodeWhisperer
3
4. Amazon Personalize
Who are we in a nutshell ?
• Customers can elevate the user experience with ML-powered personalization
• We cater to many thousands of customers from many diverse domains
• Such as: Retail, News and Media, Video on Demand, Travel and Hospitality, ..
• We provide recommendations that respond in real-time to changing user behavior
• In short, we provide the concierge service for all things personalization
4
6. Customer Obsessed Science
Applied Research at AWS AI
• Constantly innovating on behalf of the customers
• Amazon fundamentally believes that scienti
fi
c innovation is essential to being the most customer-
centric company in the world
• Science at Amazon enables new customer experiences, addresses existing customer pain points,
complements engineering and product disciplines.
6
7. 3 Anchors for the Discussion Today
ColdStart, Foundation Models in RecSys and LLMs
• Cold Start Problems in Recommender Systems
• Foundation Models in Recommender Systems
• Role Large Language Models (LLMs) can play in Recommender Systems
7
8. 3 Cold Start Problems in Recommender System
• Cold Users: Users during inference are unseen during training and model needs to generalize
• Cold Items: New items get introduced to catalogue
• Cold Domains: Target data available only for inference. No Models can be built.
• Less extreme case: Domains with very little training data / less frequent training cadence
• Performance of RecSys relies heavily on the amount of training data available
8
9. Foundation Models in Recommender Systems
Why should we talk about them ?
• De
fi
nition of a Foundation Model: A model trained on broad data that can be adapted to a wide range of
downstream tasks.
• Why Foundation Models in RecSys? Two main selling points:
• They encode “world knowledge”, thus complementary to models on domain’s behavioral data
• LLM Foundation Models’ interactive nature can potentially help with explaining away the recommendations
9
10. Two Approaches for Building Foundation Models
RecSys from Other Domains, Large Language Models
• We will talk about 2 research e
ff
ort
• ZeroShot Learning: Can we leverage the knowledge in one domain to kick start a
recommendation in a completely di
ff
erent domain
• ZeroShot Inference: We will further assume that we have no source domain to rely on. How can
we kick start a recommendation with large language models
10
12. The Status-Quo
Collaborative Filtering, Item IDs and their Embeddings
• Current RecSys models learn item ID embeddings through interactions
• Item ID Embeddings are parameters of your neural network and we learn them via BackProp
• These embeddings are indexed by categorical domain speci
fi
c item ID
• These are transductional and not generalizable to unseen items
12
13. Concept of Universal Item Embeddings
Collaborative Filtering, Item IDs and their Embeddings
• The idea behind universal item embeddings is to tap into item’s content information.
• e.g. Natural Language product description / movie synopsis etc
• Strong NLP models are used to obtain continuous universal item representations
• Universal user representations can then be built on top of these universal item representations.
13
14. Introducing ZESRec [1]
Zero Shot Recommender System
[1] “Zero Shot Recommender Systems”, Hao Ding, Anoop Deoras, Yuyang Wang, Hao Wang. ICLR Workshop 2022
• ZESRec learns the universal item embeddings based on domain-agnostic generic features — text;
• ZESRec adopts sequential recommenders which generates the universal user embeddings
14
15. We want to ask 2 questions about ZESRec
Relevance, Lead Time
• How relevant are ZESRec recommendations compared to a fully trained systems ?
• How much in domain data is needed to outperform ZESRec
• How much is the lead time ?
15
16. High Level Approach
ZESRec Training
SEQ
SEQ
SEQ
… User Universal
Embedding
1-Layer NN
Pretrained BERT
Model
X
1-Layer NN
Pretrained BERT
Model
…
0.36
0.29
…
0.09
0.02
Prediction
Score
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Item Universal
Embedding
Item Universal
Embedding
…
…
Latent Item
Offset Vector
+
Latent Item
Offset Vector
+
Latent Item
Offset Vector
+
Latent Item
Offset Vector
… Latent Item
Offset Vector
+
+
Latent User
Offset Vector
16
17. High Level Approach
ZESRec Inference
SEQ
SEQ
SEQ
… User Universal
Embedding
1-Layer NN
Pretrained BERT
Model
X
1-Layer NN
Pretrained BERT
Model
…
0.36
0.29
…
0.09
0.02
Prediction
Score
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Item Universal
Embedding
Item Universal
Embedding
…
…
17
19. Results
How long before In-Domain Model Takes over ?
19
10K 10K
5K
5K
2.5K 2.5K
0
0
Number of Interactions Number of Interactions
0.04
0.02
0
0.04
0.02
0
0.06
0.08
Recall@20 Recall@20
MIND dataset
Amazon dataset
21. From ZeroShot Learning to ZeroShot Inference
Task and Limitations
• Now lets imagine we don’t have the luxury of even having any source domain RecSys
• How realistic this assumption is ? Answer: Quite Realistic (startups, new business lines ..)
• What can we do ?
• There is no learning part left for ZeroShot Learning
• We need to resort to ZeroShot Inference
21
22. LLM Foundation Models to the rescue
Can we kick start recommendations using Large Language Models ?
• Pre-trained language models such as BERT and GPT learn general text representations
• They encode “world knowledge”
• Question we want to ask: Can we leverage these powerful LLMs as recommender systems
• Use prompts to reformulate session based recommendation task
22
23. Introducing LMRecSys[3]
Converting user’s interaction history into a text inquiry — Prompts
science fiction film directed by Peter Weir. The screenplay by Andrew Nicole was
adapted from Nicole’s 1997 novel of the same name. The film tells the story of
Truman Burbank, a man who is unwittingly placed in a televised reality show that
broadcasts every aspect of his life without his knowledge.
A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola
Run, Goldfinger. Now the user may want to watch something funny and light-hearted
comfort him after having seen some horrors.
Knowledge
Reasoning
J1-Jumbo
Large Pre-trained Language
Model
(178B Parameters)
Bolded texts are generated by the
model.
A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola Run, Goldfinger.
Now the user may want to watch __ __ __
p(d(xt)| f([d(x1), . . . , d(xt−1)]))
Item 372 Item 168 Item 413 Item 77 Item 952
p(xt |x1, . . . , xt−1)
Item 1
Item 2
Item N
…
Recommended Item
Token 1
Token 2
Token V
…
Token 1
Token 2
Token V
…
Token 1
Token 2
Token V
…
Item 1
Item 2
Item N
…
Recommended Item
Predicted Token Distributions from Language Models
Enable zero-shot recommendation
Improve data efficiency
Goal
GRU4Rec
Traditional Recommender System
LMRecSys
PLMs as Recommender System
[3] “Language Models as Recommender Systems: Evaluations and Limitations”,
Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, Hao Wang. NeurIPS Workshop 2021
23
24. Generation OR Multi-Token Inference
Answering the question of how to be faithful to one’s catalogue
• Sequence of item ID can be mapped to a long prompt
• How do we obtain ranked list of next item recommendation ?
• Generation of free form text — Need to be careful with Hallucination
• Probability Assignment on available catalogue
24
25. A Few Open Questions
Linguistic & Seq. Length Biases, Scales of LM and Creative Prompts
• Multi-Token Inference: Length normalization is important. Recommendations highly sensitive to
inference methods.
• Linguistic Biases Disentanglement: Item names need not be
fl
uent English.
• Scales of Language Models: Model size has signi
fi
cant impact on performance and latency
• Prompt Engineering: Its important to design the right prompts
25
27. The world after ChatGPT
Unleashing the immense power of Large Language Models
27
28. Recent Advances in Merging LLMs with RecSys
FineTuning an LLM
M6-Rec[5]:
P5[4]: designed a text to text
fi
ne-tuning
paradigm based on the pre-trained T5.
[4] “Recommendation as language processing (rlp): A uni
fi
ed pretrain, personalized prompt & predict paradigm (p5)”,
Geng Shijie et.al.. RecSys 2022
[5] “M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems”,
Zeyu Cui et.al.. ArXiv 2022
28
29. Recent Advances in Merging LLMs with RecSys
Inference with LLM
[6] "Zero-Shot Next-Item Recommendation using Large Pretrained Language Models." Wang, Lei, and Ee-Peng Lim. ArXiv 2023.
[7] “Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System”, Yunfan Gao et.al. ArXiv 2023
Zeyu Cui et.al.. ArXiv 2022
• NIR [6], Chat-REC[7] and [8] propose to directly recommend using LLMs — Inference only.
• Most e
ff
ort spent around “Prompt Engineering”
• Optimal encoding of user context in the prompts
• “Out of Vocabulary” problems solved using techniques such as candidate pools, text-matching
• Mixed success. Still a long way to go.
[8] “Is ChatGPT a Good Recommender? A Preliminary Study ”, Junling Liu et.al. ArXiv 2023
30. Concluding Remarks
• With the goal of building foundation models in RecSys, our e
ff
orts have been made in two directions:
• Extract Knowledge from data in similar domains
• Use Generic World Knowledge
• We believe, the ultimate path is the hybrid of both: ZESRec + LMRecSys
30