SlideShare a Scribd company logo
1 of 31
Download to read offline
Zero Shot Recommenders,
LLMs and Prompt Engineering
PRS Workshop, Net
fl
ix, 2023
June 9th, 2023
Hao Ding (haodin haoding2019 ) and Anoop Deoras (adeoras )
AWS AI, Amazon
1
Towards Building Foundation Models in Recommender Systems
Our Mission at AWS
Put Machine Learning in the Hands of Every Developer
2
The AWS ML Stack
Broadest and Most Complete Set of ML Capabilities
GenAI
NEW
Bedrock
CodeWhisperer
3
Amazon Personalize
Who are we in a nutshell ?
• Customers can elevate the user experience with ML-powered personalization
• We cater to many thousands of customers from many diverse domains
• Such as: Retail, News and Media, Video on Demand, Travel and Hospitality, ..
• We provide recommendations that respond in real-time to changing user behavior
• In short, we provide the concierge service for all things personalization
4
Amazon Personalize
Who are we in a nutshell ?
5
Customer Obsessed Science
Applied Research at AWS AI
• Constantly innovating on behalf of the customers
• Amazon fundamentally believes that scienti
fi
c innovation is essential to being the most customer-
centric company in the world
• Science at Amazon enables new customer experiences, addresses existing customer pain points,
complements engineering and product disciplines.
6
3 Anchors for the Discussion Today
ColdStart, Foundation Models in RecSys and LLMs
• Cold Start Problems in Recommender Systems
• Foundation Models in Recommender Systems
• Role Large Language Models (LLMs) can play in Recommender Systems
7
3 Cold Start Problems in Recommender System
• Cold Users: Users during inference are unseen during training and model needs to generalize
• Cold Items: New items get introduced to catalogue
• Cold Domains: Target data available only for inference. No Models can be built.
• Less extreme case: Domains with very little training data / less frequent training cadence
• Performance of RecSys relies heavily on the amount of training data available
8
Foundation Models in Recommender Systems
Why should we talk about them ?
• De
fi
nition of a Foundation Model: A model trained on broad data that can be adapted to a wide range of
downstream tasks.
• Why Foundation Models in RecSys? Two main selling points:
• They encode “world knowledge”, thus complementary to models on domain’s behavioral data
• LLM Foundation Models’ interactive nature can potentially help with explaining away the recommendations
9
Two Approaches for Building Foundation Models
RecSys from Other Domains, Large Language Models
• We will talk about 2 research e
ff
ort
• ZeroShot Learning: Can we leverage the knowledge in one domain to kick start a
recommendation in a completely di
ff
erent domain
• ZeroShot Inference: We will further assume that we have no source domain to rely on. How can
we kick start a recommendation with large language models
10
ZeroShot Learning
Kind of Like Domain Adaptation but with zero User/Item overlap
11
The Status-Quo
Collaborative Filtering, Item IDs and their Embeddings
• Current RecSys models learn item ID embeddings through interactions
• Item ID Embeddings are parameters of your neural network and we learn them via BackProp
• These embeddings are indexed by categorical domain speci
fi
c item ID
• These are transductional and not generalizable to unseen items
12
Concept of Universal Item Embeddings
Collaborative Filtering, Item IDs and their Embeddings
• The idea behind universal item embeddings is to tap into item’s content information.
• e.g. Natural Language product description / movie synopsis etc
• Strong NLP models are used to obtain continuous universal item representations
• Universal user representations can then be built on top of these universal item representations.
13
Introducing ZESRec [1]
Zero Shot Recommender System
[1] “Zero Shot Recommender Systems”, Hao Ding, Anoop Deoras, Yuyang Wang, Hao Wang. ICLR Workshop 2022
• ZESRec learns the universal item embeddings based on domain-agnostic generic features — text;
• ZESRec adopts sequential recommenders which generates the universal user embeddings
14
We want to ask 2 questions about ZESRec
Relevance, Lead Time
• How relevant are ZESRec recommendations compared to a fully trained systems ?
• How much in domain data is needed to outperform ZESRec
• How much is the lead time ?
15
High Level Approach
ZESRec Training
SEQ
SEQ
SEQ
… User Universal
Embedding
1-Layer NN
Pretrained BERT
Model
X
1-Layer NN
Pretrained BERT
Model
…
0.36
0.29
…
0.09
0.02
Prediction
Score
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Item Universal
Embedding
Item Universal
Embedding
…
…
Latent Item
Offset Vector
+
Latent Item
Offset Vector
+
Latent Item
Offset Vector
+
Latent Item
Offset Vector
… Latent Item
Offset Vector
+
+
Latent User
Offset Vector
16
High Level Approach
ZESRec Inference
SEQ
SEQ
SEQ
… User Universal
Embedding
1-Layer NN
Pretrained BERT
Model
X
1-Layer NN
Pretrained BERT
Model
…
0.36
0.29
…
0.09
0.02
Prediction
Score
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Pretrained BERT
Model
1-Layer NN
Item Universal
Embedding
Item Universal
Embedding
Item Universal
Embedding
…
…
17
Results
Efficacy
18
Results
How long before In-Domain Model Takes over ?
19
10K 10K
5K
5K
2.5K 2.5K
0
0
Number of Interactions Number of Interactions
0.04
0.02
0
0.04
0.02
0
0.06
0.08
Recall@20 Recall@20
MIND dataset
Amazon dataset
ZeroShot Inference
No reference recommender system at hand
20
From ZeroShot Learning to ZeroShot Inference
Task and Limitations
• Now lets imagine we don’t have the luxury of even having any source domain RecSys
• How realistic this assumption is ? Answer: Quite Realistic (startups, new business lines ..)
• What can we do ?
• There is no learning part left for ZeroShot Learning
• We need to resort to ZeroShot Inference
21
LLM Foundation Models to the rescue
Can we kick start recommendations using Large Language Models ?
• Pre-trained language models such as BERT and GPT learn general text representations
• They encode “world knowledge”
• Question we want to ask: Can we leverage these powerful LLMs as recommender systems
• Use prompts to reformulate session based recommendation task
22
Introducing LMRecSys[3]
Converting user’s interaction history into a text inquiry — Prompts
science fiction film directed by Peter Weir. The screenplay by Andrew Nicole was
adapted from Nicole’s 1997 novel of the same name. The film tells the story of
Truman Burbank, a man who is unwittingly placed in a televised reality show that
broadcasts every aspect of his life without his knowledge.
A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola
Run, Goldfinger. Now the user may want to watch something funny and light-hearted
comfort him after having seen some horrors.
Knowledge
Reasoning
J1-Jumbo
Large Pre-trained Language
Model
(178B Parameters)
Bolded texts are generated by the
model.
A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola Run, Goldfinger.
Now the user may want to watch __ __ __
p(d(xt)| f([d(x1), . . . , d(xt−1)]))
Item 372 Item 168 Item 413 Item 77 Item 952
p(xt |x1, . . . , xt−1)
Item 1
Item 2
Item N
…
Recommended Item
Token 1
Token 2
Token V
…
Token 1
Token 2
Token V
…
Token 1
Token 2
Token V
…
Item 1
Item 2
Item N
…
Recommended Item
Predicted Token Distributions from Language Models
Enable zero-shot recommendation
Improve data efficiency
Goal
GRU4Rec
Traditional Recommender System
LMRecSys
PLMs as Recommender System
[3] “Language Models as Recommender Systems: Evaluations and Limitations”,
Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, Hao Wang. NeurIPS Workshop 2021
23
Generation OR Multi-Token Inference
Answering the question of how to be faithful to one’s catalogue
• Sequence of item ID can be mapped to a long prompt
• How do we obtain ranked list of next item recommendation ?
• Generation of free form text — Need to be careful with Hallucination
• Probability Assignment on available catalogue
24
A Few Open Questions
Linguistic & Seq. Length Biases, Scales of LM and Creative Prompts
• Multi-Token Inference: Length normalization is important. Recommendations highly sensitive to
inference methods.
• Linguistic Biases Disentanglement: Item names need not be
fl
uent English.
• Scales of Language Models: Model size has signi
fi
cant impact on performance and latency
• Prompt Engineering: Its important to design the right prompts
25
Some Results
Experiments, Setup and Observations
26
ML 1M
The world after ChatGPT
Unleashing the immense power of Large Language Models
27
Recent Advances in Merging LLMs with RecSys
FineTuning an LLM
M6-Rec[5]:
P5[4]: designed a text to text
fi
ne-tuning
paradigm based on the pre-trained T5.
[4] “Recommendation as language processing (rlp): A uni
fi
ed pretrain, personalized prompt & predict paradigm (p5)”,
Geng Shijie et.al.. RecSys 2022
[5] “M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems”,
Zeyu Cui et.al.. ArXiv 2022
28
Recent Advances in Merging LLMs with RecSys
Inference with LLM
[6] "Zero-Shot Next-Item Recommendation using Large Pretrained Language Models." Wang, Lei, and Ee-Peng Lim. ArXiv 2023.
[7] “Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System”, Yunfan Gao et.al. ArXiv 2023
Zeyu Cui et.al.. ArXiv 2022
• NIR [6], Chat-REC[7] and [8] propose to directly recommend using LLMs — Inference only.
• Most e
ff
ort spent around “Prompt Engineering”
• Optimal encoding of user context in the prompts
• “Out of Vocabulary” problems solved using techniques such as candidate pools, text-matching
• Mixed success. Still a long way to go.
[8] “Is ChatGPT a Good Recommender? A Preliminary Study ”, Junling Liu et.al. ArXiv 2023
Concluding Remarks
• With the goal of building foundation models in RecSys, our e
ff
orts have been made in two directions:
• Extract Knowledge from data in similar domains
• Use Generic World Knowledge
• We believe, the ultimate path is the hybrid of both: ZESRec + LMRecSys
30
Thank you
Happy to take questions now
31

More Related Content

What's hot

Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaCapgemini
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learngnakan
 
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Alain Goudey
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
 
How to Ace the Product Manager Interview by HubSpot PM
How to Ace the Product Manager Interview by HubSpot PMHow to Ace the Product Manager Interview by HubSpot PM
How to Ace the Product Manager Interview by HubSpot PMProduct School
 

What's hot (20)

Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learn
 
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
ChatGPT Use- Cases
ChatGPT Use- Cases ChatGPT Use- Cases
ChatGPT Use- Cases
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
How to Ace the Product Manager Interview by HubSpot PM
How to Ace the Product Manager Interview by HubSpot PMHow to Ace the Product Manager Interview by HubSpot PM
How to Ace the Product Manager Interview by HubSpot PM
 

Similar to Foundation Models in Recommender Systems

Online TechTalk  "Patterns in Embedded SW Design"
Online TechTalk  "Patterns in Embedded SW Design"Online TechTalk  "Patterns in Embedded SW Design"
Online TechTalk  "Patterns in Embedded SW Design"GlobalLogic Ukraine
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Grid Dynamics
 
Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Grid Dynamics
 
Design Patterns Summer Course 2009-2010 - Session#1
Design Patterns Summer Course 2009-2010 - Session#1Design Patterns Summer Course 2009-2010 - Session#1
Design Patterns Summer Course 2009-2010 - Session#1Muhamad Hesham
 
Global Azure Bootcamp - ML.NET for developers
Global Azure Bootcamp - ML.NET for developersGlobal Azure Bootcamp - ML.NET for developers
Global Azure Bootcamp - ML.NET for developersChris Melinn
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machinesNAVER D2
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Lviv Startup Club
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AIDaniel Kornev
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningDr. Ananth Krishnamoorthy
 
Perception.JS - A Framework for Context Acquisition Processing and Presentation
Perception.JS - A Framework for Context Acquisition Processing and PresentationPerception.JS - A Framework for Context Acquisition Processing and Presentation
Perception.JS - A Framework for Context Acquisition Processing and PresentationSupun Dissanayake
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AIDaniel Kornev
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Node wild humana deck 2014 12-03
Node wild humana deck 2014 12-03Node wild humana deck 2014 12-03
Node wild humana deck 2014 12-03bmacwilliams
 

Similar to Foundation Models in Recommender Systems (20)

Online TechTalk  "Patterns in Embedded SW Design"
Online TechTalk  "Patterns in Embedded SW Design"Online TechTalk  "Patterns in Embedded SW Design"
Online TechTalk  "Patterns in Embedded SW Design"
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...
 
Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...Conversational commerce: emerging architectures for smart & useful chatbots -...
Conversational commerce: emerging architectures for smart & useful chatbots -...
 
Software Design
Software DesignSoftware Design
Software Design
 
Design Patterns Summer Course 2009-2010 - Session#1
Design Patterns Summer Course 2009-2010 - Session#1Design Patterns Summer Course 2009-2010 - Session#1
Design Patterns Summer Course 2009-2010 - Session#1
 
Global Azure Bootcamp - ML.NET for developers
Global Azure Bootcamp - ML.NET for developersGlobal Azure Bootcamp - ML.NET for developers
Global Azure Bootcamp - ML.NET for developers
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machines
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AI
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
Perception.JS - A Framework for Context Acquisition Processing and Presentation
Perception.JS - A Framework for Context Acquisition Processing and PresentationPerception.JS - A Framework for Context Acquisition Processing and Presentation
Perception.JS - A Framework for Context Acquisition Processing and Presentation
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AI
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Node wild humana deck 2014 12-03
Node wild humana deck 2014 12-03Node wild humana deck 2014 12-03
Node wild humana deck 2014 12-03
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 

Recently uploaded

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 

Recently uploaded (20)

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 

Foundation Models in Recommender Systems

  • 1. Zero Shot Recommenders, LLMs and Prompt Engineering PRS Workshop, Net fl ix, 2023 June 9th, 2023 Hao Ding (haodin haoding2019 ) and Anoop Deoras (adeoras ) AWS AI, Amazon 1 Towards Building Foundation Models in Recommender Systems
  • 2. Our Mission at AWS Put Machine Learning in the Hands of Every Developer 2
  • 3. The AWS ML Stack Broadest and Most Complete Set of ML Capabilities GenAI NEW Bedrock CodeWhisperer 3
  • 4. Amazon Personalize Who are we in a nutshell ? • Customers can elevate the user experience with ML-powered personalization • We cater to many thousands of customers from many diverse domains • Such as: Retail, News and Media, Video on Demand, Travel and Hospitality, .. • We provide recommendations that respond in real-time to changing user behavior • In short, we provide the concierge service for all things personalization 4
  • 5. Amazon Personalize Who are we in a nutshell ? 5
  • 6. Customer Obsessed Science Applied Research at AWS AI • Constantly innovating on behalf of the customers • Amazon fundamentally believes that scienti fi c innovation is essential to being the most customer- centric company in the world • Science at Amazon enables new customer experiences, addresses existing customer pain points, complements engineering and product disciplines. 6
  • 7. 3 Anchors for the Discussion Today ColdStart, Foundation Models in RecSys and LLMs • Cold Start Problems in Recommender Systems • Foundation Models in Recommender Systems • Role Large Language Models (LLMs) can play in Recommender Systems 7
  • 8. 3 Cold Start Problems in Recommender System • Cold Users: Users during inference are unseen during training and model needs to generalize • Cold Items: New items get introduced to catalogue • Cold Domains: Target data available only for inference. No Models can be built. • Less extreme case: Domains with very little training data / less frequent training cadence • Performance of RecSys relies heavily on the amount of training data available 8
  • 9. Foundation Models in Recommender Systems Why should we talk about them ? • De fi nition of a Foundation Model: A model trained on broad data that can be adapted to a wide range of downstream tasks. • Why Foundation Models in RecSys? Two main selling points: • They encode “world knowledge”, thus complementary to models on domain’s behavioral data • LLM Foundation Models’ interactive nature can potentially help with explaining away the recommendations 9
  • 10. Two Approaches for Building Foundation Models RecSys from Other Domains, Large Language Models • We will talk about 2 research e ff ort • ZeroShot Learning: Can we leverage the knowledge in one domain to kick start a recommendation in a completely di ff erent domain • ZeroShot Inference: We will further assume that we have no source domain to rely on. How can we kick start a recommendation with large language models 10
  • 11. ZeroShot Learning Kind of Like Domain Adaptation but with zero User/Item overlap 11
  • 12. The Status-Quo Collaborative Filtering, Item IDs and their Embeddings • Current RecSys models learn item ID embeddings through interactions • Item ID Embeddings are parameters of your neural network and we learn them via BackProp • These embeddings are indexed by categorical domain speci fi c item ID • These are transductional and not generalizable to unseen items 12
  • 13. Concept of Universal Item Embeddings Collaborative Filtering, Item IDs and their Embeddings • The idea behind universal item embeddings is to tap into item’s content information. • e.g. Natural Language product description / movie synopsis etc • Strong NLP models are used to obtain continuous universal item representations • Universal user representations can then be built on top of these universal item representations. 13
  • 14. Introducing ZESRec [1] Zero Shot Recommender System [1] “Zero Shot Recommender Systems”, Hao Ding, Anoop Deoras, Yuyang Wang, Hao Wang. ICLR Workshop 2022 • ZESRec learns the universal item embeddings based on domain-agnostic generic features — text; • ZESRec adopts sequential recommenders which generates the universal user embeddings 14
  • 15. We want to ask 2 questions about ZESRec Relevance, Lead Time • How relevant are ZESRec recommendations compared to a fully trained systems ? • How much in domain data is needed to outperform ZESRec • How much is the lead time ? 15
  • 16. High Level Approach ZESRec Training SEQ SEQ SEQ … User Universal Embedding 1-Layer NN Pretrained BERT Model X 1-Layer NN Pretrained BERT Model … 0.36 0.29 … 0.09 0.02 Prediction Score Item Universal Embedding Pretrained BERT Model 1-Layer NN Item Universal Embedding Pretrained BERT Model 1-Layer NN Item Universal Embedding Item Universal Embedding Item Universal Embedding … … Latent Item Offset Vector + Latent Item Offset Vector + Latent Item Offset Vector + Latent Item Offset Vector … Latent Item Offset Vector + + Latent User Offset Vector 16
  • 17. High Level Approach ZESRec Inference SEQ SEQ SEQ … User Universal Embedding 1-Layer NN Pretrained BERT Model X 1-Layer NN Pretrained BERT Model … 0.36 0.29 … 0.09 0.02 Prediction Score Item Universal Embedding Pretrained BERT Model 1-Layer NN Item Universal Embedding Pretrained BERT Model 1-Layer NN Item Universal Embedding Item Universal Embedding Item Universal Embedding … … 17
  • 19. Results How long before In-Domain Model Takes over ? 19 10K 10K 5K 5K 2.5K 2.5K 0 0 Number of Interactions Number of Interactions 0.04 0.02 0 0.04 0.02 0 0.06 0.08 Recall@20 Recall@20 MIND dataset Amazon dataset
  • 20. ZeroShot Inference No reference recommender system at hand 20
  • 21. From ZeroShot Learning to ZeroShot Inference Task and Limitations • Now lets imagine we don’t have the luxury of even having any source domain RecSys • How realistic this assumption is ? Answer: Quite Realistic (startups, new business lines ..) • What can we do ? • There is no learning part left for ZeroShot Learning • We need to resort to ZeroShot Inference 21
  • 22. LLM Foundation Models to the rescue Can we kick start recommendations using Large Language Models ? • Pre-trained language models such as BERT and GPT learn general text representations • They encode “world knowledge” • Question we want to ask: Can we leverage these powerful LLMs as recommender systems • Use prompts to reformulate session based recommendation task 22
  • 23. Introducing LMRecSys[3] Converting user’s interaction history into a text inquiry — Prompts science fiction film directed by Peter Weir. The screenplay by Andrew Nicole was adapted from Nicole’s 1997 novel of the same name. The film tells the story of Truman Burbank, a man who is unwittingly placed in a televised reality show that broadcasts every aspect of his life without his knowledge. A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola Run, Goldfinger. Now the user may want to watch something funny and light-hearted comfort him after having seen some horrors. Knowledge Reasoning J1-Jumbo Large Pre-trained Language Model (178B Parameters) Bolded texts are generated by the model. A user watched Jaws, Saving Private Ryan, The Good, the Bad, and the Ugly, Run Lola Run, Goldfinger. Now the user may want to watch __ __ __ p(d(xt)| f([d(x1), . . . , d(xt−1)])) Item 372 Item 168 Item 413 Item 77 Item 952 p(xt |x1, . . . , xt−1) Item 1 Item 2 Item N … Recommended Item Token 1 Token 2 Token V … Token 1 Token 2 Token V … Token 1 Token 2 Token V … Item 1 Item 2 Item N … Recommended Item Predicted Token Distributions from Language Models Enable zero-shot recommendation Improve data efficiency Goal GRU4Rec Traditional Recommender System LMRecSys PLMs as Recommender System [3] “Language Models as Recommender Systems: Evaluations and Limitations”, Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, Hao Wang. NeurIPS Workshop 2021 23
  • 24. Generation OR Multi-Token Inference Answering the question of how to be faithful to one’s catalogue • Sequence of item ID can be mapped to a long prompt • How do we obtain ranked list of next item recommendation ? • Generation of free form text — Need to be careful with Hallucination • Probability Assignment on available catalogue 24
  • 25. A Few Open Questions Linguistic & Seq. Length Biases, Scales of LM and Creative Prompts • Multi-Token Inference: Length normalization is important. Recommendations highly sensitive to inference methods. • Linguistic Biases Disentanglement: Item names need not be fl uent English. • Scales of Language Models: Model size has signi fi cant impact on performance and latency • Prompt Engineering: Its important to design the right prompts 25
  • 26. Some Results Experiments, Setup and Observations 26 ML 1M
  • 27. The world after ChatGPT Unleashing the immense power of Large Language Models 27
  • 28. Recent Advances in Merging LLMs with RecSys FineTuning an LLM M6-Rec[5]: P5[4]: designed a text to text fi ne-tuning paradigm based on the pre-trained T5. [4] “Recommendation as language processing (rlp): A uni fi ed pretrain, personalized prompt & predict paradigm (p5)”, Geng Shijie et.al.. RecSys 2022 [5] “M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems”, Zeyu Cui et.al.. ArXiv 2022 28
  • 29. Recent Advances in Merging LLMs with RecSys Inference with LLM [6] "Zero-Shot Next-Item Recommendation using Large Pretrained Language Models." Wang, Lei, and Ee-Peng Lim. ArXiv 2023. [7] “Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System”, Yunfan Gao et.al. ArXiv 2023 Zeyu Cui et.al.. ArXiv 2022 • NIR [6], Chat-REC[7] and [8] propose to directly recommend using LLMs — Inference only. • Most e ff ort spent around “Prompt Engineering” • Optimal encoding of user context in the prompts • “Out of Vocabulary” problems solved using techniques such as candidate pools, text-matching • Mixed success. Still a long way to go. [8] “Is ChatGPT a Good Recommender? A Preliminary Study ”, Junling Liu et.al. ArXiv 2023
  • 30. Concluding Remarks • With the goal of building foundation models in RecSys, our e ff orts have been made in two directions: • Extract Knowledge from data in similar domains • Use Generic World Knowledge • We believe, the ultimate path is the hybrid of both: ZESRec + LMRecSys 30
  • 31. Thank you Happy to take questions now 31