Benchmark comparison of Large Language ModelsMatej Varga
Benchmark comparison of Large Language Models. Check out which one performed the best, according to a study by
Ye, Seonghyeon, et al. "FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets."
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
With the recent buzz on Generative AI & Large Language Models, the question is to what extent can these technologies be applied at work or when you're studying and how easy is it to manage/develop your own models? Hear from our guest speaker from Google as he shares some insights into how industries are evolving with these trends and what are some of Google's offerings from Duet AI in Google Workspace to the GenAI App Builder on Google Cloud.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
Benchmark comparison of Large Language ModelsMatej Varga
Benchmark comparison of Large Language Models. Check out which one performed the best, according to a study by
Ye, Seonghyeon, et al. "FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets."
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
With the recent buzz on Generative AI & Large Language Models, the question is to what extent can these technologies be applied at work or when you're studying and how easy is it to manage/develop your own models? Hear from our guest speaker from Google as he shares some insights into how industries are evolving with these trends and what are some of Google's offerings from Duet AI in Google Workspace to the GenAI App Builder on Google Cloud.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
Foundation models are pre-trained models that can be fine-tuned on specific tasks or domains. These highly adaptable and high-performing models find applications across diverse domains, including Natural Language Processing (NLP), computer vision, and multimodal tasks.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Have you ever wondered why GPT models work? Do you ask questions like:
◉ How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? ◉ Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? ◉ How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Key Terms; ChatGPT Enterprise; Top Questions; Enterprise Data; Azure Search; Functions; Embeddings; Context Encoding; General Intelligence; Emerging Abilities; Chain of Thought; Plugins; Multimodal with DALL-E; Project Florence
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Using Large Language Models in 10 Lines of CodeGautier Marti
Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
ChatGPT has taken the world of natural language processing by storm, and as an experienced AI practitioner, enterprise architect, and technologist with over two decades of experience, I'm excited to share my insights on how this innovative powerhouse is designed from an AI components perspective. In this post, I'll provide a fresh take on the key components that make ChatGPT a powerful conversational AI tool, including its use of the Transformer architecture, pre-training on large amounts of text data, and fine-tuning with human feedback. With ChatGPT's massive success, there's no doubt that it's changing the way we think about language and conversation. So, whether you're a seasoned pro or new to the world of AI, my post will provide a valuable perspective on this fascinating technology. Check out my slides to learn more!
This short course is part of a module participants for Specialist Certificate in Mathematics Teaching (Primary) is offering. This module focuses on whole numbers and using everyday things to teach.
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
Foundation models are pre-trained models that can be fine-tuned on specific tasks or domains. These highly adaptable and high-performing models find applications across diverse domains, including Natural Language Processing (NLP), computer vision, and multimodal tasks.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Have you ever wondered why GPT models work? Do you ask questions like:
◉ How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? ◉ Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? ◉ How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Key Terms; ChatGPT Enterprise; Top Questions; Enterprise Data; Azure Search; Functions; Embeddings; Context Encoding; General Intelligence; Emerging Abilities; Chain of Thought; Plugins; Multimodal with DALL-E; Project Florence
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Using Large Language Models in 10 Lines of CodeGautier Marti
Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
ChatGPT has taken the world of natural language processing by storm, and as an experienced AI practitioner, enterprise architect, and technologist with over two decades of experience, I'm excited to share my insights on how this innovative powerhouse is designed from an AI components perspective. In this post, I'll provide a fresh take on the key components that make ChatGPT a powerful conversational AI tool, including its use of the Transformer architecture, pre-training on large amounts of text data, and fine-tuning with human feedback. With ChatGPT's massive success, there's no doubt that it's changing the way we think about language and conversation. So, whether you're a seasoned pro or new to the world of AI, my post will provide a valuable perspective on this fascinating technology. Check out my slides to learn more!
This short course is part of a module participants for Specialist Certificate in Mathematics Teaching (Primary) is offering. This module focuses on whole numbers and using everyday things to teach.
Connect with Maths Webinar presented by Professor Peter Sullivan: Six Principles of Effective Mathematics Teaching
There are many recommendations on how to teach mathematics but fewer about the teaching of mathematics’ classes with Indigenous students. This webinar will examine how six principles for effective mathematics teaching were adapted to advice for teachers of schools with high numbers of Indigenous students.
Math Rotations: A Strategy for Teaching Mathrachelrhorn
Math Rotations is a teaching strategy that allows for differentiation in your lessons. It is perfect for an elementary classroom. Please check out my product that perfectly accompanies this presentations. http://www.teacherspayteachers.com/Product/Math-Rotations-Math-Teaching-Strategy-869440
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
Large Language Models Are Reasoning Teachers
1. Large Language Models Are
Reasoning Teachers
Namgyu Ho Laura Schmid Se-Young Yun
KAIST AI
🧑🏫
2. Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
Large Language Models Are Reasoning Teachers
3. Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 400GB VRAM 💰.
Large Language Models Are Reasoning Teachers
4. Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
Large Language Models Are Reasoning Teachers
5. Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
§ Diverse reasoning ✨ is a simple way to boost teaching.
Large Language Models Are Reasoning Teachers
6. Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
§ Diverse reasoning ✨ is a simple way to boost teaching.
§ Extensive analysis 🕵 on the emergence of reasoning.
Large Language Models Are Reasoning Teachers
7. Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insu cient.
Large Language Models Are Reasoning Teachers
8. Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insufficient.
§ Limitation: CoT prompting is only applicable to very large models such as GPT-
3 175B and PaLM.
Large Language Models Are Reasoning Teachers
9. Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insufficient.
§ Limitation: CoT prompting is only applicable to very large models such as GPT-
3 175B and PaLM.
§ Solution: apply CoT prompting on very large models to generate training data
on complex reasoning for smaller models.
Large Language Models Are Reasoning Teachers
10. Method: Fine-tune-CoT
Large Language Models Are Reasoning Teachers
Original Sample
Question
A pet store had 56 puppies. In one
day they sold 24 of them and put the
rest into cages with 4 in each cage.
How many cages did they use?
Answer
.8.
Prompt (Zero-shot-CoT)
Q: A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use?
A: Let’s think step by step.
Completion (Generated)
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are 4
puppies in each cage, that means that the
store now has .8 cages.
Step 1. Reasoning Generation
Large 175B Teacher Model
Step 2. Curation
Small Student Model
Prompt
A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use? ###
Completion
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are
4 puppies in each cage, that means that
the store now has 8 cages.
--> 8 END
Reasoning Sample (Curated)
Dataset
Step 3. Fine-tuning
{
Diverse Reasoning
11. Method: Fine-tune-CoT
Large Language Models Are Reasoning Teachers
Original Sample
Question
A pet store had 56 puppies. In one
day they sold 24 of them and put the
rest into cages with 4 in each cage.
How many cages did they use?
Answer
.8.
Prompt (Zero-shot-CoT)
Q: A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use?
A: Let’s think step by step.
Completion (Generated)
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are 4
puppies in each cage, that means that the
store now has .8 cages.
Step 1. Reasoning Generation
Large 175B Teacher Model
Step 2. Curation
Small Student Model
Prompt
A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use? ###
Completion
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are
4 puppies in each cage, that means that
the store now has 8 cages.
--> 8 END
Reasoning Sample (Curated)
Dataset
Step 3. Fine-tuning
{
Diverse Reasoning
✨
15. Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables significant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
16. Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
17. Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
18. Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
19. Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
20. Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables significant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
§ Performance is highly scalable under Fine-tune-CoT.
21. Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables signi cant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
§ Performance is highly scalable under Fine-tune-CoT.
§ Tradeo s must be considered between
§ Development-time cost: diverse reasoning, dataset size, teacher model
§ Inference-time cost: student model
22. (Analysis & Discussion)
§ Cost analysis of data acquisition
§ How to filter teacher reasoning samples. Do we need to?
§ Emergence of reasoning in small language models
§ Distillation of emergent abilities
§ Connection with knowledge distillation
Large Language Models Are Reasoning Teachers
23. Takeaways
§ Simple distillation can transfer 🧚 reasoning abilities from very large teachers
to small students <1B for a single domain.
§ What about other emergent abilities?
§ Fine-tune-CoT with diverse reasoning is an accessible and e ective approach
which is highly scalable.
§ Distillation poses a tradeo between development costs and inference
cost/quality.
Large Language Models Are Reasoning Teachers
24. Large Language Models Are
Reasoning Teachers
Namgyu Ho Laura Schmid Se-Young Yun
KAIST AI
🧑🏫
Paper
§ Why does reasoning
emerge in small models
§ Results on GPT-2, T5
Code
§ All code and data
§ $1000+ worth of teacher data
with ❤ from OSI LAB @ KAIST.