This document provides an overview of a tutorial on language models and ChatGPT. It introduces the topics that will be covered in five parts:
Part 1 provides an introduction to neural language models, including their typical architecture, main types, pre-training and fine-tuning, and multi-task learning capabilities.
Part 2 discusses large language models like GPT-3 and prompt engineering.
Part 3 focuses on the technologies behind ChatGPT and related models.
Part 4 reviews recent achievements of ChatGPT and other models on NLP benchmarks and evaluations.
Part 5 examines efforts to develop responsible large language models.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
This presentation is for SotA models in NLP called Transformer & BERT review materials. I reviewed many model in here Word2Vec, ELMo, GPT, ... etc
reference 1 : Kim Dong Ha (https://www.youtube.com/watch?v=xhY7m8QVKjo)
reference 2 : Raimi Karim (https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
This presentation is for SotA models in NLP called Transformer & BERT review materials. I reviewed many model in here Word2Vec, ELMo, GPT, ... etc
reference 1 : Kim Dong Ha (https://www.youtube.com/watch?v=xhY7m8QVKjo)
reference 2 : Raimi Karim (https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3)
深層学習以降のAI研究の流れの中で、特に、基盤モデルにおけるchain of thought promptingやfactual groundingに焦点を当て、基盤モデルが論理的推論などの意識レベルの処理を学習したと言えるかについて考察する。
時間が許せば、深層学習によるpostdictionの可能性等についても論じる。
Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
This is the welcome lecture for IA377. Topics to be addressed:
Organization
Seminar Dynamics
Evaluation
Tentative Topics and Schedule
Round of introductions
https://ia377-feec-unicamp.github.io/classes/2023/03/02/Welcome.html
深層学習以降のAI研究の流れの中で、特に、基盤モデルにおけるchain of thought promptingやfactual groundingに焦点を当て、基盤モデルが論理的推論などの意識レベルの処理を学習したと言えるかについて考察する。
時間が許せば、深層学習によるpostdictionの可能性等についても論じる。
Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
This is the welcome lecture for IA377. Topics to be addressed:
Organization
Seminar Dynamics
Evaluation
Tentative Topics and Schedule
Round of introductions
https://ia377-feec-unicamp.github.io/classes/2023/03/02/Welcome.html
The application of computer aided learning to learn basic concepts of branchi...ijma
This paper reviews how to create an application based on Computer Aided Learning (CAL), which is the
use of computers to deliver instructional materials and involve students / learners are active. This CAL
application consists of 5 multimedia modules. Module 1 contains the basic concepts of data processing.
Module 2 discusses the concept of flowcharts. Module 3 contains the testing applications using a flowchart.
Module 4 contains the concept of nested selection. While the tutorial module 5 contains the concept of
arrays and study case practice . To support this, CAL application is made as attractive as possible, by
combining multimedia files, such as image files, sound files, and video files. With the CAL is equipped with
modules of algorithms, students are expected to take computer courses, especially in STMIK STIKOM
Surabaya can improve learning outcomes at the course logic algorithms.
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...Pieter Pauwels
Presentation at the Technical Room of the BuildingSMART Standards Summit October 2015 in Singapore. The presentation was done together with Jakob Beetz, TUEindhoven, with strong support by Walter Terkaj, ITIA-CNR, and Kris McGlinn, TCDublin. It is part of the SWIMing H2020 project, run by Kris McGlinn (http://swiming-project.eu/).
Seminar and Project Manager and Resourceful Trainer(SMART)IOSR Journals
Abstract: This paper presents an approach to eradicate all of the confusion which surrounds anyone while
preparing for the project and seminar. Also it aims in helping the institution to manage the previous batch’s
seminars and project. Seminar and project activity mainly deals with effective data searching and keeping pace
with emerging technologies. This paper focuses the concepts like keeping data at one place, providing guidance
related to Projects and seminars. This paper can represent an application that can be used by anyone for
solving queries related to project and seminar.
Similar to PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2) (20)
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2)
1. [introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
● [Part 1, Part 2] https://www.fai.cds.tohoku.ac.jp/research/activities/
● [Part 3, Part 4] https://speakerdeck.com/kyoun/pakdd2023-tutorial
● [Part 5] https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
2. 1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Schedule
● Overview [This part]
● Part 1: [introductory] Language models (LMs)
● Part 2: Large language models (LLMs)
● Part 3: Technologies underlying ChatGPT-like LLMs
● Part 4: Recent achievements in ChatGPT-like LLMs
● Part 5: Efforts for Responsible LLMs
● Q&A
10 min
20 min
20 min
20 min
20 min
(Short Break: 10 min)
(Short Break: 10 min)
20 min
20 min
3. 2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Abstract
● Language models (LMs) have a long history in natural language processing
(NLP) research. Their usage was mainly a text generation module (or
calculating likelihood of word sequences) in machine translation and speech
recognition systems, used together with translation or acoustic models. After
the current neural era, LMs take an essential role in the NLP field. In fact, LMs
are integrated into any models/systems to tackle almost all the NLP tasks and
provide state-of-the-art performance on conventional NLP benchmarks. The
usage of LMs is considered to be shifting to more like a world model of
languages or a general-purpose feature generator of any language-related
tasks. More recently, the public sometimes treats LMs like ChatGPT, and its
successor GPT-4, as general-purpose AI after starting an online service in the
public domain.
● This tutorial will first introduce some introductory topics we should know when
discussing the recent advances in LMs like ChatGPT. We will then briefly
introduce the technologies behind ChatGPT-like LMs. Additionally, we also
provide ChatGPT’s social impacts discussed recently in public.
4. 3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Aims of This Tutorial
● This tutorial aims to introduce necessary factual pieces of knowledge for
discussing the latest LMs to researchers outside of the NLP field.
● In addition, another goal is for audiences to understand the strengths and
weaknesses of LMs from the viewpoint of LM users, and help their future
studies by learning the knowledge from this tutorial.
● Our audiences require no prior knowledge of LMs.
5. 4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Brief Overview
● This tutorial will begin with some introductory topics that we
should know when discussing recent advances in LMs like
ChatGPT.
[Part 1, Part 2]
● https://www.fai.cds.tohoku.ac.jp/research/activities/
● We will then briefly introduce the technologies behind
ChatGPT-like LMs and their achievements.
[Part 3, Part 4]
● https://speakerdeck.com/kyoun/pakdd2023-tutorial
● Finally, we will also provide a topic of ChatGPT-like LMs,
Responsible LLMs, discussed recently in general public.
[Part 5]
● https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
6. 5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (1/5)
● Part 1: [introductory] Neural Language models (LMs)
◆ Traditional Definition of LMs
◆ Typical Base Model Architecture: Transformer
◆ Three Major Model Types: Encoder, Decoder, Encoder-decoder
◆ Universal Features
◆ Pre-training & Fine-tuning Scheme
◆ Multi-task Learner
7. 6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (2/5)
● Part 2: Large Language Models (LLMs)
◆ LMs’ Scaling Laws (Parameter size ver.)
◆ LLMs: GPT-3
◆ Prompt Engineering
◆ Achievement and Remaining Issues
8. 7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (3/5)
● Part 3: Technologies underlying ChatGPT-like LLMs
◆ Codex
◆ InstructGPT and ChatGPT
◆ GPT-4 and ChatGPT plugins
◆ LLaMA
◆ Alpaca and Vicuna
◆ MPT
◆ LLaVA
9. 8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (4/5)
● Part 4: Recent achievements in ChatGPT-like LLMs
◆ Performance in Natural Language Processing benchmarks and exams
◆ Interesting results in Vision-and-Language Understanding evaluations
◆ Open-source applications powered by LLMs
◆ Remaining Issues
10. 9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (5/5)
● Part 5: Efforts for Responsible LLMs
◆ High-level overview of potential harms of LLMs
◆ Efforts for reducing potential harms in GPT-4 and PaLM 2
11. [introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 1
12. 1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 1: [introductory]
Neural Language models (LMs)
We only discuss neural LMs
=> In this talk, “LM” means “neural LM”
(unless otherwise specified)
13. 2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
14. 3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
15. 4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 1
l Part 1: [introductory] Neural Language Models (LMs)
u ① Traditional Definition of LMs
u ② Typical Base Model Architecture: Transformer
u ③ Three Major Model Types: Encoder, Decoder, Encoder-decoder
u ④ Universal Features
u ⑤ Pre-training & Fine-tuning Scheme
u ⑥ Multi-task Learner
16. 5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① Traditional Definition of Language Model (LM)
Language Model = Probabilistic model
u Modeling probabilities for all next words (probability distribution)
in vocabulary, given a context
: Context, prefix text, before j-th word
=
J+1
Y
j=1
P✓(yj | Y<j)
✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
: Target word, j-th word
ocean is ̲̲
??
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
the
,
dark
bleu
white
today
fine
excellent
good
scary
…
Vocabulary Probability
…
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Conditional Probability
??
dark ocean is ̲̲
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
17. 6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Traditional LMs: 𝑛-gram LM
l Example: Google 𝑛-gram
https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html
serve as the incoming 92
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
serve as the industrial 52
serve as the industry 607
serve as the info 42
serve as the informal 102
serve as the information 838
serve as the informational 41
serve as the initial 5331
serve as the initiating 125
serve as the initiation 63
serve as the initiator 81
serve as the …………
Counts of consecutive 𝑛 words
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Context Target word Count Probability
Total 187491
0.00049
0.00423
0.00119
0.00038
0.00064
0.00024
0.00059
0.00021
0.00125
0.00028
0.00324
0.00022
0.00054
0.00447
0.00022
0.02843
0.00067
0.00034
0.00043
18. 7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Neural LM
l Fit probability distribution over a sequence of words
given a context with a (deep) neural network
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
19. 8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (1/2)
l If we have an LM, we can ... ??
1, Evaluate likelihood of given texts
Example: Statistical machine translation (SMT) system
HI YWUY1 KKIUH WKUYN LO WUNI
Input text
Translation
Model
Language Model
(LM)
(^ ^) ♪( ´▽`) ##!
Output text
&’# ()#’(“
OO)” LOW)
(^ ^) ♪( ´▽`)
##! ##!
OOooOO
(^~^)
21.5 x
145.2 x
3.2 x
111.2 x
...
72.4 x
9.2 x
...
Translation
score
21.5
35.2
21.2
11.2
...
32.4
119.2
...
LM
score
(intermediate)
Candidates
20. 9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
21. 10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
Estimation: 𝑒
We have never met before , right ? Nice to meet you !
Neural LM (e.g., Generative pre-trained transformer: GPT)
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
Example: Generating text
using neural LM Nice to
Nice to
meet
meet
you
you
, too ...
, ...
Vocabulary Vocabulary Vocabulary Vocabulary
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
22. 11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Rough comparisons between general abilities of 𝑛-gram
LM and Neural LM
Why using Neural LM
Computational
cost
Longer context
Unseen context
𝑛-gram LM Neural LM
😩 😋
😩
😋
😩 😋
Performance
(in terms of perplexity) 😩 😋
23. 12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
② Typical Base Model Architecture
Transformer
u Surprisingly, all the famous LMs developed recently have chosen
Transformers as their base model
24. 13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (1/2)
l 2017: Published First draft paper
u Developed primarily for machine translation tasks
From: Ashish Vaswani [view email]
[v1] Mon, 12 Jun 2017 17:57:34 UTC
[v2] Mon, 19 Jun 2017 16:49:45 UTC
[v3] Tue, 20 Jun 2017 05:20:02 UTC
[v4] Fri, 30 Jun 2017 17:29:30 UTC
[v5] Wed, 6 Dec 2017 03:30:32 UTC
25. 14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (2/2)
l 2017: Published First draft paper
l 2018: Selected as base model architecture of
BERT (and GPT)
u => fundamental model for language
l 2023 [Current]: used as base model architecture for
(almost) all famous LMs
l e.g., GPT-3 (ChatGPT), GPT-4, PaLM (PaLM2), OPT, LLaMa, Bloom, ...
(GPT: Generative Pre-trained Transformer)
(BERT: Bidirectional Encoder Representations from Transformers)
26. 15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
[FYI] Transformer for Image Processing
l Used for image processing tasks
u 2020: Vision Transformer (ViT)
u Competitive to the conventional CNNs
Split given image
into parts according
to the mesh
https://openreview.net/forum?id=YicbFdNTTy
27. 16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Three Major Model Types
l Encoder type (Masked LMs)
l e.g., BERT, RoBERTa
l Bidirectional
l Decoder type (Causal LMs)
l e.g., GPT
l Unidirectional (left-to-right)
l Encoder-decoder type
l e.g., T5
l Encoder: bidirectional,
Decoder: unidirectional
Not discuss deeply about Encoder type in this talk
Reason: This type becomes less important in these days,
in terms of the context of “Generative AI”
28. 17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ LMs as Universal Features
l Typical usages of LMs
=> If we have an LM, we can ... ??
u 1, Evaluate likelihood
of given texts
u 2, Generate texts
u 3, Use LMs as universal features
l Novel usage of LMs
l Ability of neural networks has enabled it
l All LMs explicitly/implicitly take this approach
Identical to
traditional usages
like n-gram LMs
Pioneer of
this usage
29. 18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Universal Features
l Neural LM => Implicitly encodes/captures many
linguistic aspects such as
l Such learned linguistic aspects can help a lot for many
NLP tasks
LM
Training
l Distribution of word occurrences given contexts
l Semantically similar/dissimilar expressions
l Syntactic/semantic structural information
30. 19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv,
GitHub
⑤ Pre-training / Fine-tuning Scheme
l Two stage training scheme of LMs
u 1, Pre-training
l Train LMs from extremely many raw texts
obtained from the web
u 2, Fine-tuning
l Train LMs from human annotated data
u Human annotated data: relatively small
Pre-Training
Language
Model
Human-annotated data
Fine-tuning
First step
Second step
31. 20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Viewpoint from Data Sparsity Problem
l Pre-training / Fine-tuning scheme
u Variant of Transfer learning
u Requires less human annotated data to achieve reasonable task
performance
l Free from having to prepare a large amount of human
annotation data for every NLP task
u Relatively expensive and time-consuming to increase human
annotation data
l May solves (or alleviates) an essential problem (called
data sparsity problem) in the NLP community that has
remained unsolved for a long time
32. 21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
⑥ LMs as multitask Learner
l Tackle to solve any NLP tasks by a single model
u Possible by casting in a unified ”text-to-text” generation task
Copied from
https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
Example: T5
33. 22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Multitask Learner
+ Pre-training / Fine-tuning Scheme
l Single LM can solve many NLP tasks
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv, GitHub
Fact checking
○○ is good
is bad
Sentiment Analysis
Machine Translation
Hello! こんにちは
本日のニュース
・株価 ○○
・通常国会、、
・東京都、、
Text summarization
Question Answering
Language
Model
MT data QA data SA data
”text-to-text” format
Pre-Training
Fine-tuning
First step
Second step
T5 (and partially GPT-2): pioneers of this approach
=> First trial toward artificial general intelligence in LM literature
34. 23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 1
35. 24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (1/2)
l ① Language Model: Probabilistic Model
u Modeling probabilities for all possible next words (probability
distribution) in vocabulary, given a context
l ② Base Model Architecture: Transformer
u Developed mainly for neural machine translation (NMT)
l ③ Model Types
u Encoder-type: (discard discussions of this type in this talk)
u Decoder-type:
u Encoder-decoder-type:
36. 25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (2/2)
l ④ Universal Features
u Neural LM: Implicitly encodes/captures many linguistic aspects
l E.g., ELMo, BERT, GPT-2, RoBERTa, ...
l ⑤ Pre-training & Fine-tuning Scheme
u Valiant of transfer learning (from Pre-trained LMs)
u Free from having to prepare a large amount of human annotation
data for every NLP task
u Alleviate data sparsity problem (long-time problem) in NLP
l ⑥ Multi-task Learner (towards artificial general
intelligence)
u Tackle any NLP tasks in a single model
u Toward artificial general intelligence
37. [introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 2
38. 1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 2: Large language models (LLMs)
39. 2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
40. 3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
41. 4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 2
l Part 2: Large Language Models (LLMs)
u ① LMsʼ Scaling Laws (Parameter size ver.)
u ② LLMs: GPT-3
u ③ Prompt Engineering
u ④ Achievement: LLMs and Prompts
u ⑤ Remaining Issues: LLMs and Prompts
42. 5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① LMsʼ Scaling Laws (Parameter size)
l General tendency:
More parameters => Better performance !?
[Kaplan+, 2020] Scaling Laws for Neural Language Models, arXiv:2001.08361
N : number of parameters (w/o embedding vectors) [logarithmic scale]
Sm
aller is
better
43. 6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMsʼ Scaling Laws: Parameter Size
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Roughly x10 per year
44. 7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
② LLMs: GPT-3
l Introducing several new concepts
u 1, Larger model: First >100B parameter LMs
u 2, Potential on prompt engineering (in-context learning)
l Few-shot learning
l Instruction
https://arxiv.org/abs/2005.14165
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Large jump-up in model size
(note: y-axis is log-scale)
45. 8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
GPT-3: Impact of using Large Neural Models
l By scaling up the model parameters,
LMs enable to gain the ... :
u 1, Capacity to memorize much of the training data in its entirety
u 2, Ability to produce very fluent sentences, like a human writer
u 3, Solvable patterns for simple calculations, reasoning,
argumentative thinking, common sense, etc., at the elementary
level
( Some might say, "gain the ability to solve/think simple
calculations ..." but not in this talk )
46. 9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate Just Like Human Writers (?)
l Trial to generate fake news
Figure 3.14: The GPT-3 generated news article that humans had the greatest difficulty
distinguishing from a human written article (accuracy: 12%).
Title: United Methodists Agree to Historic Split
Subtitle: Those who oppose gay marriage will form their own denomination
Article: After two days of intense debate, the United Methodist Church has agreed to a historic split
- one that is expected to end in the creation of a new denomination, one that will be "theologically
and socially conservative," according to The Washington Post. The majority of delegates attending
the church's annual General Conference in May voted to strengthen a ban on the ordination of
LGBTQ clergy and to write new rules that will "discipline" clergy who officiate at same-sex weddings.
But those who opposed these measures have a new plan: They say they will form a separate
denomination by 2020, calling their church the Christian Methodist denomination.
The Post notes that the denomination, which claims 12.5 million members, was in the early 20th
century the "largest Protestant denomination in the U.S.,” but that it has been shrinking in recent
decades. The new split will be the second in the church's history. The first occurred in 1968, when
roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The
Post notes that the proposed split "comes at a critical time for the church, which has been losing
members for years," which has been "pushed toward the brink of a schism over the role of LGBTQ
people in the church." Gay marriage is not the only issue that has divided the church. In 2016, the
denomination was split over ordination of transgender clergy, with the North Pacific regional
conference voting to ban them from serving as clergy, and the South Pacific regional conference
voting to allow them.
Given Title, Subtitle and prefix
(prompt) word “Article:”
47. 10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate texts just like human writers?
l Several reports that attest to GPT-3's ability to
generate fluent sentences that can potentially fool us
https://www.technologyreview.com/2020/08/14/1006
780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/
l A student who studies computer science at the UC
Berkeley, develops a fake blog site using GPT-3.
l One of the fake blog posts reaches #1 on Hacker News.
https://metastable.org/gpt-3.html
l Someone posted on a Reddit message
board for nearly a week using GPT-3.
l The posting by GPT-3 was eventually
stopped by the developer, but many
readers were unaware that the posts were
AI-generated.
48. 11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of LLMs
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
49. 12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Prompt Engineering
l Model size becomes larger
=> fine-tuning cost also becomes larger
l Consider reducing the computational cost
=> Method without additional training from Pre-trained
LLMs
l Alternative approach of pre-training / fine-tuning
scheme
50. 13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
What is Prompt?
l Starting text given to LMs to generate subsequent text.
Example: standard text completion case
Sensoji is Language Model the oldest temple in Tokyo ...
Prompt
Equivalent to the “context”
(prefix text) in LMs
Why ? In GPT-2 & GPT-3 papers
Example: Machine translation experiment
Context format
English sentence = Japanese sentence
Hello. = こんにちは。
It is snowing. = 雪が降っている
Today it is sunny. = 今日は晴天です。
Yesterday it rained. =
Prompt
Starting
text
Confusing, but currently,
we usually call the entire
starting text (prefix text) as
“prompt“, not the last
sentence, since we set
several different types of
texts in the prompt.
This looks like “prompt” of
shell in terminal ??
51. 14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
More Information in Context
l Conventional LMs
u Generate sentences following context words
=> LMsʼ basic task: text completion
l LM + Prompt
u Generate sentences by giving a sort of instruction sentences on
how we expect LMs to solve a target task as context words
l e.g., 1, Examples (exemplars),
2, (Human) Instruction for explaining the target task
u Elicit the implicit capabilities of the LLM to solve the given task,
without changing (no-learn) the model parameters
l Prerequisite: already learned through pre-training
( => can't resolve the task that hasnʼt been learned in general )
u Use context to "control" generation
52. 15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (1/3)
l Equivalent to the “context” (prefix text)
u But, not just a standard prefix text, but more for ordering or
instruction for what we want LMs to generate a sentence
Estimation: 𝑒 Sensoji Temple ” <EOS>
Question: 𝑞
What is the oldest temple in Tokyo ? The answer is “ Sensoji Temple ”
instruction
Example: Question answering
Generative Pre-trained Transformer (GPT)
Q: What is the oldest temple in Tokyo? A: Sensoji / Sensoji Temple
53. 16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (2/3)
l Setting where the model is given a few demonstrations
(examples/exemplars) of the task at inference time
u No weight updates
https://arxiv.org/abs/2005.14165
54. 17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of Few-show Learning
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
55. 18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Scaling up model size alone has not proved sufficient
high performance on challenging tasks
u Arithmetic, Commonsense, Symbolic reasoning
Chain-of-thought (CoT) prompting improves performance
Example
(One-shot)
Question
56. 19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Examples of CoT prompting for challenging tasks
57. 20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of CoT Prompting
l Performance of CoT prompting for challenging tasks
58. 21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ Achievement and Remaining Issues
l Achievement
u LLMs
l Very fluent like human-made sentences
u Prompt engineering
l Solve many language tasks without
1, parameter update
2, preparing the task specific supervised data
59. 22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Remaining Issues
l Prompt: not consistently effective
u Limited availability, dependent on pre-training data
l Effective prompt: somewhat artificial (or non-intuitive to humans)
u LM tasks: basically text-completion tasks
l Sometimes generates
unreliable (not based on facts),
biased (discriminatory or socially criticized),
inconsistent (opposed responses in the same context), or
harmful (information that should not publicly shared) texts
Instruction tuning and chat tuning (Part 3)
60. 23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 2
61. 24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ① LMsʼ Scaling laws
u Larger is better ?
l ② LLMs: GPT-3
u Very fluent as human writer
l ③ Prompt engineering (in-context learning)
u Background: discard fine-tuning cost
l Unnecessary of the additional cost for train large model
u Few-shot examples, chain-of-thought, instruction, ...
62. 25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ④ Achievement and Remaining Issues: LLMs and
Prompts
u Achievement
l Very fluent as human writer
l Solve many language tasks without preparing the task specific
supervised data
u Remaining Issues
l Prompts can be somewhat artificial
l How we prevent to generate unreliable, biased, inconsistent, or
harmful texts