SlideShare a Scribd company logo
1 of 26
Download to read offline
Are Human-generated Demonstrations Necessary for
In-context Learning?
Rui Li, Guoyin Wang, Jiwei Li
ICLR 2024
Presentation1 by
Mengsay Loem (M2)
Paper Reading
2024/02/19
1
All figures are borrowed from the paper
Mengsay Loem Self-Contemplation Prompting Paper Reading 1 / 26
Summary
Introduces Self-Contemplation prompting (SEC), diverging from
In-Context Learning (ICL) by asking models to generate their own
demos.
cf. Self-ICL [Chen et al., 2023] (EMNLP 2023)
SEC achieves comparable results to ICL across tasks without
human-annotated training data.
Address the drawbacks of ICL:
laborious efforts in demonstration crafting
instability of human-crafted demos
Why this paper?
In the realm of LLMs, the content of prompts is crucial. Thus,
automating aspects of the prompting process warrants more focus.
Exploring prompting methods, including Chain of Thought (CoT),
could provide significant insights and serve as useful hints for your
research.
Mengsay Loem Self-Contemplation Prompting Paper Reading 2 / 26
In-Context Learning (ICL)
Large Language Models (LLMs) learn in context from a few
annotated examples as demonstrations.
Conventional ICL depends heavily on human-generated examples,
which presents scalability and quality challenges.
Mengsay Loem Self-Contemplation Prompting Paper Reading 3 / 26
Disadvantages of Conventional ICL
Sensitivity to demonstration selection; there is no standard criteria for
optimal choice [Liu et al., 2022, Lu et al., 2023].
Crafting demonstration is labor-intensive and complex, often involving
detailed reasoning processes.
Mengsay Loem Self-Contemplation Prompting Paper Reading 4 / 26
Research Question
Do we really need humans to provide LLMs with the demonstrations,
or can LLMs generate demonstrations on their own?
Mengsay Loem Self-Contemplation Prompting Paper Reading 5 / 26
Proposed: Self-Contemplation prompting (SEC) 1/2
LLMs are prompted to reflect on
their own knowledge to generate
relevant demonstrations.
E.g., Please create five
similar multiple choice
questions with choice
labels, choice text and
an answer label (A or B
or C or D).
SEC allows LLMs to
autonomously generate training
examples, decreasing
dependence on curated datasets.
Mengsay Loem Self-Contemplation Prompting Paper Reading 6 / 26
Proposed: Self-Contemplation prompting (SEC) 2/2
Integration with CoT: SEC seamlessly adapts to the Chain of
Thought (CoT) strategy.
CoT-SEC prompts LLMs to autonomously generate demonstrations
with the reasoning process.
E.g., Please generate five similar questions with
step-by-step reasoning process and an integer answer
Advantages of SEC:
Eliminates the need for manually crafted demonstrations.
Generates custom demonstrations per test input, enhancing each
example’s support for improved performance across datasets.
Mengsay Loem Self-Contemplation Prompting Paper Reading 7 / 26
Experiments
Compared to Zero-shot and ICL
(Few-shot)
use fix demonstration sets
from related work
Tasks
Arithmetic: MATH (4-shot),
GSM8K (5-shot)
Commonsense Reasoning:
ARC (5-shot)
Multi-task NLU: MMLU,
C-Eval (4-shot)
Code Generation: HumanEval
(4-shot)
Evaluation Metric
(Exact Match) Accuracy
Mengsay Loem Self-Contemplation Prompting Paper Reading 8 / 26
Results
Outperforms zero-shot
On par with few-shot/CoT ICL
Mengsay Loem Self-Contemplation Prompting Paper Reading 9 / 26
Pivotal Insights
Zero-shot  SEC ≈ few-shot ICL and CoT-ICL
Bridges the gap between zero-shot and few-shot learning.
LLMs can independently make decisions, minimizing reliance on
external data.
SEC, inherently a zero-shot and unsupervised method, generates its
own demonstrations.
Challenges the necessity for annotated data.
Suggests LLMs like GPT-3.5/4 and Llama 33B could perform well
independently.
Mengsay Loem Self-Contemplation Prompting Paper Reading 10 / 26
Do We Really No Longer Need Annotated Data?
What about the effect of
Number Shots?
Model Capability?
Demonstration Quality?
Mengsay Loem Self-Contemplation Prompting Paper Reading 11 / 26
Ablation Study: Number of Shots
SEC often reaches its optimal performance with fewer shots than ICL
SEC’s ability to create input-specific demonstrations eliminates the
need for diverse examples for different test inputs
Mengsay Loem Self-Contemplation Prompting Paper Reading 12 / 26
Ablation Study: Model Capacity
SEC underperforms compared to ICL with less capable models.
Weaker models struggle with instructions and generate lower-quality
examples
Mengsay Loem Self-Contemplation Prompting Paper Reading 13 / 26
Dynamics of Few-Shot Demonstrations’ Accuracy
Incorrect few-shot demonstrations can lead to correct predictions and
vice versa.
Incorrect demonstrations → correct results typically involve
non-critical errors, such as answer extraction, computation, and
question errors, rather than fundamental logical errors.
Correct demonstrations → incorrect results might not align well
with the test question, or they may oversimplify, missing the nuances
necessary for accurate predictions.
For further analysis on CoT, please refer to:
What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study [Madaan et al., 2023] (EMNLP
2023)
Revealing the Mystery behind Chain of Thought: A Theoretical Perspective [Feng et al., 2023] (NeurIPS 2023)
Why think step by step? Reasoning emerges from the locality of experience [Prystawski et al., 2023] (NeurIPS
2023)
Mengsay Loem Self-Contemplation Prompting Paper Reading 14 / 26
Similarity Between Demonstrations and Test Question
SEC has closer similarity to test questions compared to ICL
(somehow) aligns with kNN prompting principles – selection based on
nearest relevance
Diminishing similarity (and effectiveness) with distance from the test
question
Mengsay Loem Self-Contemplation Prompting Paper Reading 15 / 26
Summary
Proposed SEC by asking models to generate their own demos for ICL.
Zero-shot  SEC ≈ few-shot ICL and CoT-ICL
SEC’s performance may decline in cases where
model lacks strength
test data is poorly represented in the training set
Related topic: Why think step by step? Reasoning emerges from the
locality of experience [Prystawski et al., 2023] (NeurIPS 2023)
Related Work
cf. Self-ICL [Chen et al., 2023] (EMNLP 2023)
SEC and kNN Prompting [Xu et al., 2023] share the idea of using
demonstrations tailored to each test question.
SEC is similar to Self-prompting [Wang et al., 2023] and Auto-CoT
[Zhang et al., 2023] in terms of using LLM-generated demonstrations
but is more flexible without the need for generating numerous samples
or performing clustering and selection.
Mengsay Loem Self-Contemplation Prompting Paper Reading 16 / 26
Comments
Relying on demonstrations generated by LLMs may decrease their
effectiveness in intricate or underrepresented situations.
I am keen on a more thorough discussion regarding the comparison of
effectiveness and efficiency between these demonstrations and
advanced retrieval techniques for the selection.
I would appreciate further assessment concerning the quality of
demonstrations provided.
Mengsay Loem Self-Contemplation Prompting Paper Reading 17 / 26
Thank you!
Mengsay Loem Self-Contemplation Prompting Paper Reading 18 / 26
Appendix
Mengsay Loem Self-Contemplation Prompting Paper Reading 19 / 26
CoT-SEC vs. CoT-ICL: Case study in GSM8K
Analyzed the accuracy of 1319 test samples in GSM8K under both
CoT-SEC and CoT-ICL.
Results reveal very similar overall performance between the two
methods.
See Appendix B.4 in the paper for details.
Mengsay Loem Self-Contemplation Prompting Paper Reading 20 / 26
Instruction for Demonstration Generation
Mengsay Loem Self-Contemplation Prompting Paper Reading 21 / 26
SEC [Li et al., 2024] vs. Auto-CoT [Zhang et al., 2023]
CoT-SEC’s performance is comparable to Auto-CoT.
even without access to the full test dataset and additional clustering
Mengsay Loem Self-Contemplation Prompting Paper Reading 22 / 26
Another Limitation
SEC may experience performance degradation in scenarios where the
test data is not sufficiently represented in the training set.
Mengsay Loem Self-Contemplation Prompting Paper Reading 23 / 26
Reference I
Chen, W.-L., Wu, C.-K., Chen, Y.-N., and Chen, H.-H. (2023).
Self-ICL: Zero-shot in-context learning with self-generated demonstrations.
In Bouamor, H., Pino, J., and Bali, K., editors, Proceedings of the 2023 Conference on
Empirical Methods in Natural Language Processing, pages 15651–15662, Singapore.
Association for Computational Linguistics.
Feng, G., Zhang, B., Gu, Y., Ye, H., He, D., and Wang, L. (2023).
Towards revealing the mystery behind chain of thought: A theoretical perspective.
In Thirty-seventh Conference on Neural Information Processing Systems.
Li, R., Wang, G., and Li, J. (2024).
Are human-generated demonstrations necessary for in-context learning?
In The Twelfth International Conference on Learning Representations.
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., and Chen, W. (2022).
What makes good in-context examples for GPT-3?
In Agirre, E., Apidianaki, M., and Vulić, I., editors, Proceedings of Deep Learning Inside
Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for
Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for
Computational Linguistics.
Mengsay Loem Self-Contemplation Prompting Paper Reading 24 / 26
Reference II
Lu, P., Qiu, L., Chang, K.-W., Wu, Y. N., Zhu, S.-C., Rajpurohit, T., Clark, P., and
Kalyan, A. (2023).
Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning.
In The Eleventh International Conference on Learning Representations.
Madaan, A., Hermann, K., and Yazdanbakhsh, A. (2023).
What makes chain-of-thought prompting effective? a counterfactual study.
In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for
Computational Linguistics: EMNLP 2023, pages 1448–1535, Singapore. Association for
Computational Linguistics.
Prystawski, B., Li, M. Y., and Goodman, N. (2023).
Why think step by step? reasoning emerges from the locality of experience.
In Thirty-seventh Conference on Neural Information Processing Systems.
Wang, J., Li, J., and Zhao, H. (2023).
Self-prompted chain-of-thought on large language models for open-domain multi-hop
reasoning.
In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for
Computational Linguistics: EMNLP 2023, pages 2717–2731, Singapore. Association for
Computational Linguistics.
Mengsay Loem Self-Contemplation Prompting Paper Reading 25 / 26
Reference III
Xu, B., Wang, Q., Mao, Z., Lyu, Y., She, Q., and Zhang, Y. (2023).
$k$NN prompting: Beyond-context learning with calibration-free nearest neighbor
inference.
In The Eleventh International Conference on Learning Representations.
Zhang, Z., Zhang, A., Li, M., and Smola, A. (2023).
Automatic chain of thought prompting in large language models.
In The Eleventh International Conference on Learning Representations.
Mengsay Loem Self-Contemplation Prompting Paper Reading 26 / 26

More Related Content

Similar to Are Human-generated Demonstrations Necessary for In-context Learning?

abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docbutest
 
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...CS, NcState
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011Adi Ali
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...IJCNCJournal
 
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...IJCNCJournal
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earningAnirudh Ganguly
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...butest
 
A Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano ClassifierA Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano Classifierjournal ijrtem
 
Crimson Publishers-On the Advantages of Simulation based Approach in Engineering
Crimson Publishers-On the Advantages of Simulation based Approach in EngineeringCrimson Publishers-On the Advantages of Simulation based Approach in Engineering
Crimson Publishers-On the Advantages of Simulation based Approach in EngineeringCrimsonpublishers-Electronics
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...Iosif Itkin
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfJetender Sharma
 
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdf
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdfTheory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdf
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdfssuser941d48
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 

Similar to Are Human-generated Demonstrations Necessary for In-context Learning? (20)

abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.doc
 
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
 
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...
Utilizing XAI Technique to Improve Autoencoder based Model for Computer Netwo...
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Eric Smidth
Eric SmidthEric Smidth
Eric Smidth
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 
A Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano ClassifierA Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano Classifier
 
Crimson Publishers-On the Advantages of Simulation based Approach in Engineering
Crimson Publishers-On the Advantages of Simulation based Approach in EngineeringCrimson Publishers-On the Advantages of Simulation based Approach in Engineering
Crimson Publishers-On the Advantages of Simulation based Approach in Engineering
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
 
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdf
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdfTheory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdf
Theory and Applications of Monte Carlo Simulations by Chan V. (Ed.).pdf
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 

Recently uploaded

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Recently uploaded (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Are Human-generated Demonstrations Necessary for In-context Learning?

  • 1. Are Human-generated Demonstrations Necessary for In-context Learning? Rui Li, Guoyin Wang, Jiwei Li ICLR 2024 Presentation1 by Mengsay Loem (M2) Paper Reading 2024/02/19 1 All figures are borrowed from the paper Mengsay Loem Self-Contemplation Prompting Paper Reading 1 / 26
  • 2. Summary Introduces Self-Contemplation prompting (SEC), diverging from In-Context Learning (ICL) by asking models to generate their own demos. cf. Self-ICL [Chen et al., 2023] (EMNLP 2023) SEC achieves comparable results to ICL across tasks without human-annotated training data. Address the drawbacks of ICL: laborious efforts in demonstration crafting instability of human-crafted demos Why this paper? In the realm of LLMs, the content of prompts is crucial. Thus, automating aspects of the prompting process warrants more focus. Exploring prompting methods, including Chain of Thought (CoT), could provide significant insights and serve as useful hints for your research. Mengsay Loem Self-Contemplation Prompting Paper Reading 2 / 26
  • 3. In-Context Learning (ICL) Large Language Models (LLMs) learn in context from a few annotated examples as demonstrations. Conventional ICL depends heavily on human-generated examples, which presents scalability and quality challenges. Mengsay Loem Self-Contemplation Prompting Paper Reading 3 / 26
  • 4. Disadvantages of Conventional ICL Sensitivity to demonstration selection; there is no standard criteria for optimal choice [Liu et al., 2022, Lu et al., 2023]. Crafting demonstration is labor-intensive and complex, often involving detailed reasoning processes. Mengsay Loem Self-Contemplation Prompting Paper Reading 4 / 26
  • 5. Research Question Do we really need humans to provide LLMs with the demonstrations, or can LLMs generate demonstrations on their own? Mengsay Loem Self-Contemplation Prompting Paper Reading 5 / 26
  • 6. Proposed: Self-Contemplation prompting (SEC) 1/2 LLMs are prompted to reflect on their own knowledge to generate relevant demonstrations. E.g., Please create five similar multiple choice questions with choice labels, choice text and an answer label (A or B or C or D). SEC allows LLMs to autonomously generate training examples, decreasing dependence on curated datasets. Mengsay Loem Self-Contemplation Prompting Paper Reading 6 / 26
  • 7. Proposed: Self-Contemplation prompting (SEC) 2/2 Integration with CoT: SEC seamlessly adapts to the Chain of Thought (CoT) strategy. CoT-SEC prompts LLMs to autonomously generate demonstrations with the reasoning process. E.g., Please generate five similar questions with step-by-step reasoning process and an integer answer Advantages of SEC: Eliminates the need for manually crafted demonstrations. Generates custom demonstrations per test input, enhancing each example’s support for improved performance across datasets. Mengsay Loem Self-Contemplation Prompting Paper Reading 7 / 26
  • 8. Experiments Compared to Zero-shot and ICL (Few-shot) use fix demonstration sets from related work Tasks Arithmetic: MATH (4-shot), GSM8K (5-shot) Commonsense Reasoning: ARC (5-shot) Multi-task NLU: MMLU, C-Eval (4-shot) Code Generation: HumanEval (4-shot) Evaluation Metric (Exact Match) Accuracy Mengsay Loem Self-Contemplation Prompting Paper Reading 8 / 26
  • 9. Results Outperforms zero-shot On par with few-shot/CoT ICL Mengsay Loem Self-Contemplation Prompting Paper Reading 9 / 26
  • 10. Pivotal Insights Zero-shot SEC ≈ few-shot ICL and CoT-ICL Bridges the gap between zero-shot and few-shot learning. LLMs can independently make decisions, minimizing reliance on external data. SEC, inherently a zero-shot and unsupervised method, generates its own demonstrations. Challenges the necessity for annotated data. Suggests LLMs like GPT-3.5/4 and Llama 33B could perform well independently. Mengsay Loem Self-Contemplation Prompting Paper Reading 10 / 26
  • 11. Do We Really No Longer Need Annotated Data? What about the effect of Number Shots? Model Capability? Demonstration Quality? Mengsay Loem Self-Contemplation Prompting Paper Reading 11 / 26
  • 12. Ablation Study: Number of Shots SEC often reaches its optimal performance with fewer shots than ICL SEC’s ability to create input-specific demonstrations eliminates the need for diverse examples for different test inputs Mengsay Loem Self-Contemplation Prompting Paper Reading 12 / 26
  • 13. Ablation Study: Model Capacity SEC underperforms compared to ICL with less capable models. Weaker models struggle with instructions and generate lower-quality examples Mengsay Loem Self-Contemplation Prompting Paper Reading 13 / 26
  • 14. Dynamics of Few-Shot Demonstrations’ Accuracy Incorrect few-shot demonstrations can lead to correct predictions and vice versa. Incorrect demonstrations → correct results typically involve non-critical errors, such as answer extraction, computation, and question errors, rather than fundamental logical errors. Correct demonstrations → incorrect results might not align well with the test question, or they may oversimplify, missing the nuances necessary for accurate predictions. For further analysis on CoT, please refer to: What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study [Madaan et al., 2023] (EMNLP 2023) Revealing the Mystery behind Chain of Thought: A Theoretical Perspective [Feng et al., 2023] (NeurIPS 2023) Why think step by step? Reasoning emerges from the locality of experience [Prystawski et al., 2023] (NeurIPS 2023) Mengsay Loem Self-Contemplation Prompting Paper Reading 14 / 26
  • 15. Similarity Between Demonstrations and Test Question SEC has closer similarity to test questions compared to ICL (somehow) aligns with kNN prompting principles – selection based on nearest relevance Diminishing similarity (and effectiveness) with distance from the test question Mengsay Loem Self-Contemplation Prompting Paper Reading 15 / 26
  • 16. Summary Proposed SEC by asking models to generate their own demos for ICL. Zero-shot SEC ≈ few-shot ICL and CoT-ICL SEC’s performance may decline in cases where model lacks strength test data is poorly represented in the training set Related topic: Why think step by step? Reasoning emerges from the locality of experience [Prystawski et al., 2023] (NeurIPS 2023) Related Work cf. Self-ICL [Chen et al., 2023] (EMNLP 2023) SEC and kNN Prompting [Xu et al., 2023] share the idea of using demonstrations tailored to each test question. SEC is similar to Self-prompting [Wang et al., 2023] and Auto-CoT [Zhang et al., 2023] in terms of using LLM-generated demonstrations but is more flexible without the need for generating numerous samples or performing clustering and selection. Mengsay Loem Self-Contemplation Prompting Paper Reading 16 / 26
  • 17. Comments Relying on demonstrations generated by LLMs may decrease their effectiveness in intricate or underrepresented situations. I am keen on a more thorough discussion regarding the comparison of effectiveness and efficiency between these demonstrations and advanced retrieval techniques for the selection. I would appreciate further assessment concerning the quality of demonstrations provided. Mengsay Loem Self-Contemplation Prompting Paper Reading 17 / 26
  • 18. Thank you! Mengsay Loem Self-Contemplation Prompting Paper Reading 18 / 26
  • 19. Appendix Mengsay Loem Self-Contemplation Prompting Paper Reading 19 / 26
  • 20. CoT-SEC vs. CoT-ICL: Case study in GSM8K Analyzed the accuracy of 1319 test samples in GSM8K under both CoT-SEC and CoT-ICL. Results reveal very similar overall performance between the two methods. See Appendix B.4 in the paper for details. Mengsay Loem Self-Contemplation Prompting Paper Reading 20 / 26
  • 21. Instruction for Demonstration Generation Mengsay Loem Self-Contemplation Prompting Paper Reading 21 / 26
  • 22. SEC [Li et al., 2024] vs. Auto-CoT [Zhang et al., 2023] CoT-SEC’s performance is comparable to Auto-CoT. even without access to the full test dataset and additional clustering Mengsay Loem Self-Contemplation Prompting Paper Reading 22 / 26
  • 23. Another Limitation SEC may experience performance degradation in scenarios where the test data is not sufficiently represented in the training set. Mengsay Loem Self-Contemplation Prompting Paper Reading 23 / 26
  • 24. Reference I Chen, W.-L., Wu, C.-K., Chen, Y.-N., and Chen, H.-H. (2023). Self-ICL: Zero-shot in-context learning with self-generated demonstrations. In Bouamor, H., Pino, J., and Bali, K., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15651–15662, Singapore. Association for Computational Linguistics. Feng, G., Zhang, B., Gu, Y., Ye, H., He, D., and Wang, L. (2023). Towards revealing the mystery behind chain of thought: A theoretical perspective. In Thirty-seventh Conference on Neural Information Processing Systems. Li, R., Wang, G., and Li, J. (2024). Are human-generated demonstrations necessary for in-context learning? In The Twelfth International Conference on Learning Representations. Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., and Chen, W. (2022). What makes good in-context examples for GPT-3? In Agirre, E., Apidianaki, M., and Vulić, I., editors, Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for Computational Linguistics. Mengsay Loem Self-Contemplation Prompting Paper Reading 24 / 26
  • 25. Reference II Lu, P., Qiu, L., Chang, K.-W., Wu, Y. N., Zhu, S.-C., Rajpurohit, T., Clark, P., and Kalyan, A. (2023). Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In The Eleventh International Conference on Learning Representations. Madaan, A., Hermann, K., and Yazdanbakhsh, A. (2023). What makes chain-of-thought prompting effective? a counterfactual study. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1448–1535, Singapore. Association for Computational Linguistics. Prystawski, B., Li, M. Y., and Goodman, N. (2023). Why think step by step? reasoning emerges from the locality of experience. In Thirty-seventh Conference on Neural Information Processing Systems. Wang, J., Li, J., and Zhao, H. (2023). Self-prompted chain-of-thought on large language models for open-domain multi-hop reasoning. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2717–2731, Singapore. Association for Computational Linguistics. Mengsay Loem Self-Contemplation Prompting Paper Reading 25 / 26
  • 26. Reference III Xu, B., Wang, Q., Mao, Z., Lyu, Y., She, Q., and Zhang, Y. (2023). $k$NN prompting: Beyond-context learning with calibration-free nearest neighbor inference. In The Eleventh International Conference on Learning Representations. Zhang, Z., Zhang, A., Li, M., and Smola, A. (2023). Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations. Mengsay Loem Self-Contemplation Prompting Paper Reading 26 / 26