SlideShare a Scribd company logo
1 of 46
Download to read offline
Zachary Brown Charlottesville Data Science Meetup, 2023-09-18
Working in NLP in the Age of
Large Language Models
An Historical Perspective
Outline
Setting the Scene
Who’s who in NLP
The Alex Net Moment
Context is King
Sequences Abound
Attention is Everything
Multi-mode, multi-task, multi-billion
A New World
Introduction An Historical Perspective
Introduction
Outline
Setting the Scene
Who’s who in NLP
The Alex Net Moment
Context is King
Sequences Abound
Attention is Everything
Multi-mode, multi-task, multi-billion
A New World
An Historical Perspective
Advances in the technology have opened up a variety of novel use cases, and the hype
has caused a massive shift in both expectations and allowances for model performance
Setting the Scene
Generative AI has taken the world by storm over the past year, driven largely by the
groundbreaking performance of a small number of closed-source models
These recent advances have their roots in over a decade of accumulating foundational
research
(Partial) Generative AI Landscape
Companies and Models
Tooling
(Partial) Generative AI Landscape
Companies and Models
Tooling
Advances in the technology have opened up a variety of novel use cases, and the
hype has caused a massive shift in both expectations and allowances for model
performance
Setting the Scene
Generative AI has taken the world by storm over the past year, driven largely by the
groundbreaking performance of a small number of closed-source models
These recent advances have their roots in over a decade of accumulating foundational
research
The legal cases…
The harm
Novelty (Across Many Dimensions)
The good: Fun with content generation, trying to take
over the world, fun generative agents. Also, does
better on tests than you do…
The legal cases…
The harm: Hallucinations, disinformation,
workforce impacts, misguided usage,
environmental impacts
Novelty (Across Many Dimensions)
The good: Fun with content generation, trying to take
over the world, fun generative agents. Also, does
better on tests than you do…
The legal cases…
The harm: Hallucinations, disinformation,
workforce impacts, misguided usage,
environmental impacts
Novelty (Across Many Dimensions)
The good: Fun with content generation, trying to take
over the world, fun generative agents. Also, does
better on tests than you do…
Hype Cycle
we are here
Advances in the technology have opened up a variety of novel use cases, and the hype
has caused a massive shift in both expectations and allowances for model performance
Setting the Scene
Generative AI has taken the world by storm over the past year, driven largely by the
groundbreaking performance of a small number of closed-source models
These recent advances have their roots in over a decade of accumulating
foundational research
Foundational Roots of LLMs
2001
First Neural Language Model
2010
RNNs for Language Modeling
2013
Contextual Word Embeddings
2015
Attention Mechanism
2017
Transformer Architecture
How does the impact of previous advances compare to what’s happening right now?
Questions to Cover
What previous advances have led us here?
What have these changes meant for those in this field, and in the periphery of this field?
Engineers and Architects
Who: Software engineers, architects, devops and data engineers
What: Core contributors required to mature technology beyond specialized startups to enterprise
grade / scale
Who’s Who in NLP
Researchers and Practitioners
Who: Machine learning researchers and engineers, data scientists, computational linguists
What: Driving advances in tech and/or knowledgeable enough to immediately leverage
Business Interests and Specialists
Who: C-Suite members, enterprise technical leaders, founders and investors, and domain specialists
(research)
What: Recognizing maturity and leveraging advances in tech for relevant use cases
Introduction
Outline
Setting the Scene
Who’s who in NLP
The Alex Net Moment
Context is King
Sequences Abound
Attention is Everything
Multi-mode, multi-task, multi-billion
A New World
An Historical Perspective
The AlexNet Moment
(+ history, 2001 - 2012)
I’d argue that one of the most important moments leading to
NLP having broader impacts outside of academic/specialist
communities wasn’t an advancement in NLP at all…
But first, a bit of history on NLP research…
One of the foundational tasks for NLP is language modeling,
which is simply predicting the next word in a sequence given
the previous sequence
● In 2001, Bengio et. al. introduced an early neural
model for next token prediction
● In 2010, Mikolov et. al. explored the application of
RNNs for LM*
Early Neural Methods
for Language Modeling
* Extensions such as the LSTM gained massive popularity in subsequent years
Engineers and Architects
Impact?
Researchers and Practitioners
Exploration of new techniques really pushing the boundaries of what’s possible, but…
…often need to understand how to implement models from scratch with few standard libraries
Business Interests and Specialists
The AlexNet Moment
(2012)
In 2012, the AlexNet architecture won the ImageNet
competition with a 10.5% reduction in top-5 error.
This tangibly demonstrated the promise of neural networks
in solving problems that could provide tangible business
value
The resurgent popularity of neural networks had broad
implications beyond just the researchers and practitioners
working in this space
Engineers and Architects
Huge opportunity to build out standard frameworks to support deep learning research and
development
Impact?
Researchers and Practitioners
Purpose-built neural net architectures have the potential to substantially outperform prior methods
and should be more thoroughly explored for NLP use cases
Business Interests and Specialists
Early signals that neural nets can provide substantial business and research value, this is an
important area for early investment
Context Is King
(2013)
In 2013, Mikalov et. al. demonstrated word2vec, a technique
that efficiently produced contextual word embeddings at
scale, from a large, unlabeled corpus
This work was followed in 2014 by the GloVe embedding
method (Pennington et. al.) which leverages global
co-occurrance statistics to generate contextual embeddings
Both research groups made these sets of pre-trained word
embeddings publicly available under the Apache 2.0 license
Engineers and Architects
New interesting opportunities and challenges for large-scale unstructured data sets. New
packages emerging for generating and using publicly-available model artifacts
Impact?
Researchers and Practitioners
Unsupervised pre-training has the potential to capture interesting semantic relationships without
the need for expensive (and error prone) supervised human labeled data.
Business Interests and Specialists
My unstructured data has inherent value that can be extracted in an automated way.
There’s a new ecosystem emerging of privately-funded research efforts producing and releasing
valuable IP
In 2014, Sutskever et. al. from the Google Brain team
introduced a novel approach for leveraging neural nets
to map sequences to sequences. This had major
implications for neural machine translation among
other tasks
In the next year, Bahdanau et. al. published a novel
approach to neural machine translation introducing the
attention mechanism.
Sequences Abound
(2014 - 2015)
Engineers and Architects
Neural nets are starting to pop up in a variety of use cases. I should check out this new tensorflow
thing
Impact?
Researchers and Practitioners
We can now tackle seq2seq problems, and the attention mechanism shows great promise in letting
a model architecture learn which context is relevant for word prediction
Business Interests and Specialists
Large commercial R&D investments are producing truly novel tech that’s driving a step change in
capabilities. New business opportunities for startups, new investments for enterprises
In 2017, a new work from Vaswani et. al. demonstrated that
“Attention is All You Need,” extending the attention
mechanism proposed several years earlier to construct
stacked blocks of multi-headed attention, or transformers.
In the next year, the promise of transformers was firmly
established with the release of both the original BERT paper
from the Google AI Language team as well as the original GPT
paper from a team at OpenAI
Attention is Everything
(2017 - 2018)
Blocks of (multi-head) self-attention
are the key component of
transformers encoder and decoder
blocks, allowing the model to learn
deep contextual representations of
the input tokens (typically
bidirectional)
Attention is All You Need
Bidirectional Encoder Representations from Transformers
(BERT) demonstrated that transformer encoder-only models
can be efficiently trained through a two step process of
unsupervised pre-training followed by task-specific
fine-tuning
The masked-language-model (MLM) pretraining paradigm
opened the door for leveraging massive textual corpora to
produce extremely performant models, while fine-tuning
allowed practitioners to directly benefit from the substantial
investments of large industry research groups
BERT and the Encoders
The original GPT paper followed a similar “pretrain then
fine-tune” approach using a decoder only transformer
architecture.
Pretraining was carried out using with an autoregressive
language model objective, with a variety of tasks for
subsequent fine-tuning.
GPT
Engineers and Architects
Deep learning toolkits are becoming more broadly available, and non-specialists can now
experiment and build useful machine learning systems
Impact?
Researchers and Practitioners
Transformers are an incredibly robust tool for learning deep contextual representations for
language tasks. We should explore everything we can for transformers and pre-training paradigms
Business Interests and Specialists
The trend in open-sourcing valuable IP is only accelerating, an ecosystem is rapidly developing for
AI. What tooling needs to be built? What NLP use cases does my organization have?
Multi-mode, Multi-task,
Multi-billion (2019 - 2021)
In the wake of the success of models such as BERT and GPT,
research interests shifted to exploring, extending, and
augmenting various paradigms introduced in these recent
works, such as:
● Extensions to attention mechanism
● Encoder-decoder architectures
● Pre-training and fine-tuning paradigms
● Larger and smaller (more efficient) models
● Multi-modal applications
The original attention mechanism is robust, but
computationally expensive as sequence lengths grow. Models
such as Longformer, Reformer, Performer, etc. explored
various methods for extending the attention mechanism to
longer sequence lengths.
Extend Your Attention
While encoder and decoder-only models such as BERT and
GPT demonstrated great promise in their own right,
extensive research efforts were focused on leveraging full
encoder-decoder transformer architectures for sequence to
sequence tasks such as translation, reading comprehension,
summarization, etc.
Models like BART, T5, Pegasus, and ULM all leveraged
encoder-decoder models along with novel training paradigms
to produce performant models across a variety of tasks
Encode and Decode
Extending the fine-tuning paradigms introduced in previous
work, many groups shifted focus to using single architecture
to perform a variety of disparate tasks
Models like T5 probed the limits of multi-task transfer
learning, while the family of GPT models evolved (GPT-2 and
GPT-3) demonstrating the benefits of multi-task learning and
highlighting emergent capabilities of larger models such as
few-shot learning and generalization.
“Fine-Tuning Language Models from Human Preferences”
demonstrated that human feedback can play a key role in
aligning generative model outputs with human expectations
Task Variety, Instructions,
and Alignment
While some work sought to make transformer models smaller
and more efficient (pruning, distillation, quantization), other
works focused on scaling up to much larger models (and
larger training datasets).
Novel works explored new scaling laws for the era of large
language models
Bigger (and Smaller)
Models
A huge body of research also emerged around multi-modal
applications of transformers (Xu et. al. for a recent survey)
One particularly visible application of transformers applied
to audio was the Whisper model released by OpenAI , which
leveraged an relatively straightforward transformer and a
huge volume of data to produce a performant speech to text
model
Multi-Modal Models
Researchers and Practitioners
Wow… Also, everything is a transformer.
Engineers and Architects
With models getting so big, there are novel challenges for ML training and inference.
Access to public models allows us to incorporate ML into nearly any application with zero training
Impact?
Business Interests and Specialists
I can build a new business on top of open source machine learning easier than ever before, and so
can everyone else. How can I leverage our data and talent to apply this new tech to our business?
In November, OpenAI announced the public release of
ChatGPT, a large (and notably unspecified) generative model
fine-tuned through reinforcement learning with human
feedback (RLHF).
For many, this system was the first glimpse into the
technological advances of the past decade and the tangible
utility these models can provide.
The model demonstrated a step change in previously
demonstrated generative language modeling capabilities,
causing a noticeable shift in research interest and
commercial applications of NLP
A New World
(2022 - Present)
In early 2023, there were seemingly no competitive systems
available to the public that could match the performance of
this new model.
Aside from the novelty of the system in information recall or
content-generation use cases, the few-shot and zero-shot
performance of the model is remarkable
Coupled with restrictive terms of service for competitive
commercial uses, a sort of moat had been established around
this novel technology
The Dominance of GPT
As 2023 has progressed, competitive open-source models
have been released monthly, if not weekly. Some key aspects
can determine the utility of these models for business use
cases:
● Commercial permissibility (and data usage ToS!)
● Training / tuning paradigm
● Code generation
Competition Heats Up
To support this new emerging ecosystem of LLMs, a whole
host of new tools have emerged, and some existing solutions
have found new life. Some prominent areas of novel tooling
include:
● Chain creation and management
● Vector data stores
● LLM training and inference platforms
● LLM testing and monitoring suites
● Labeling platforms
A New Tooling
Ecosystem
Along with the proliferation of new tools, new techniques
that aid in working with these new models have gained in
popularity.
Parameter efficient fine-tuning (PEFT) techniques including
LoRA, p-tuning, etc. have emerged as a viable path for
domain/task adaptation on consumer hardware (along with
quantization)
New research has also emerged this year focusing on more
flexible embedding techniques (ALiBi, RoPE) that allow for
model training (and adaptation) to longer context windows
Trends in Methodology,
Both New…
Along with the proliferation of new tools, many familiar tools
and paradigms are being revisited and revitalized, including
components of more traditional chatbots, information
retrieval systems, and enterprise labeling platforms
Trends in Methodology,
Both New… … and Familiar
A New Hammer
(But Not for Every Nail)
LLMs excel in a variety of generative use cases, such as
conversational assistance, content generation (code,
templates, etc.), and retrieval augmented generation. They
also enable novel opportunities via autonomous agents.
Although LLMs are capable for a wide variety of tasks, for
discriminative use cases, when data is available at scale,
and/or when factual accuracy is critical (without a human in
the loop), smaller, more efficient models are often a better
option
Engineers and Architects
We can build generative AI directly into our platforms (at a cost) without the typical ML R&D
lifecycle, but it’s difficult to get traction beyond PoC stage
Impact?
Researchers and Practitioners
The bulk of generative NLP use cases will likely be handled by LLMs, and we need to understand
how to best utilize, maintain, and constrain these systems. BUT, genAI is not always the answer!
Business Interests and Specialists
Generative AI is in full-on disruption mode, and I need to figure out how to integrate it into our
business as quickly as possible
Questions?

More Related Content

Similar to Working in NLP in the Age of Large Language Models

Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalJeffrey Shomaker
 
Vijayananda Mohire-dissertation-abstract
Vijayananda Mohire-dissertation-abstractVijayananda Mohire-dissertation-abstract
Vijayananda Mohire-dissertation-abstractVijayananda Mohire
 
Technology Use in the Virtual R&D Teams
Technology Use in the Virtual R&D TeamsTechnology Use in the Virtual R&D Teams
Technology Use in the Virtual R&D TeamsNader Ale Ebrahim
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-inglesLiz Castro B
 
Ai open powermeetupmarch25th_latest
Ai open powermeetupmarch25th_latestAi open powermeetupmarch25th_latest
Ai open powermeetupmarch25th_latestGanesan Narayanasamy
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
 
How to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortHow to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortJordi Cabot
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$Sof Ouni
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...Joseph Glowitz, PE, PEng, PMP
 
IRJET - Mobile Chatbot for Information Search
 IRJET - Mobile Chatbot for Information Search IRJET - Mobile Chatbot for Information Search
IRJET - Mobile Chatbot for Information SearchIRJET Journal
 
Musstanser Avanzament 4 (Final No Animation)
Musstanser   Avanzament 4 (Final   No Animation)Musstanser   Avanzament 4 (Final   No Animation)
Musstanser Avanzament 4 (Final No Animation)Musstanser Tinauli
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringPer Runeson
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interestsadil raja
 

Similar to Working in NLP in the Age of Large Language Models (20)

Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_final
 
Vijayananda Mohire-dissertation-abstract
Vijayananda Mohire-dissertation-abstractVijayananda Mohire-dissertation-abstract
Vijayananda Mohire-dissertation-abstract
 
Technology Use in the Virtual R&D Teams
Technology Use in the Virtual R&D TeamsTechnology Use in the Virtual R&D Teams
Technology Use in the Virtual R&D Teams
 
Narrative: Text Generation Model from Data
Narrative: Text Generation Model from DataNarrative: Text Generation Model from Data
Narrative: Text Generation Model from Data
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-ingles
 
202212APSEC.pptx.pdf
202212APSEC.pptx.pdf202212APSEC.pptx.pdf
202212APSEC.pptx.pdf
 
Ai open powermeetupmarch25th_latest
Ai open powermeetupmarch25th_latestAi open powermeetupmarch25th_latest
Ai open powermeetupmarch25th_latest
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
 
Oss2015
Oss2015Oss2015
Oss2015
 
How to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortHow to sustain a tool building community-driven effort
How to sustain a tool building community-driven effort
 
Climbing the tree of unreachable fruits, reusing processes
Climbing the tree of unreachable fruits, reusing processesClimbing the tree of unreachable fruits, reusing processes
Climbing the tree of unreachable fruits, reusing processes
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...
King lakhani the-contingent-effect-of-absorptive-capacity-an-open-innovation-...
 
IRJET - Mobile Chatbot for Information Search
 IRJET - Mobile Chatbot for Information Search IRJET - Mobile Chatbot for Information Search
IRJET - Mobile Chatbot for Information Search
 
Musstanser Avanzament 4 (Final No Animation)
Musstanser   Avanzament 4 (Final   No Animation)Musstanser   Avanzament 4 (Final   No Animation)
Musstanser Avanzament 4 (Final No Animation)
 
Software bug prediction
Software bug prediction Software bug prediction
Software bug prediction
 
Analysis Report
 Analysis Report  Analysis Report
Analysis Report
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interests
 

More from Zachary S. Brown

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Building and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesBuilding and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesZachary S. Brown
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
Text Representations for Deep learning
Text Representations for Deep learningText Representations for Deep learning
Text Representations for Deep learningZachary S. Brown
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLPZachary S. Brown
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 

More from Zachary S. Brown (7)

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Building and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesBuilding and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model Services
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Text Representations for Deep learning
Text Representations for Deep learningText Representations for Deep learning
Text Representations for Deep learning
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLP
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Working in NLP in the Age of Large Language Models

  • 1. Zachary Brown Charlottesville Data Science Meetup, 2023-09-18 Working in NLP in the Age of Large Language Models An Historical Perspective
  • 2. Outline Setting the Scene Who’s who in NLP The Alex Net Moment Context is King Sequences Abound Attention is Everything Multi-mode, multi-task, multi-billion A New World Introduction An Historical Perspective
  • 3. Introduction Outline Setting the Scene Who’s who in NLP The Alex Net Moment Context is King Sequences Abound Attention is Everything Multi-mode, multi-task, multi-billion A New World An Historical Perspective
  • 4. Advances in the technology have opened up a variety of novel use cases, and the hype has caused a massive shift in both expectations and allowances for model performance Setting the Scene Generative AI has taken the world by storm over the past year, driven largely by the groundbreaking performance of a small number of closed-source models These recent advances have their roots in over a decade of accumulating foundational research
  • 5. (Partial) Generative AI Landscape Companies and Models Tooling
  • 6. (Partial) Generative AI Landscape Companies and Models Tooling
  • 7. Advances in the technology have opened up a variety of novel use cases, and the hype has caused a massive shift in both expectations and allowances for model performance Setting the Scene Generative AI has taken the world by storm over the past year, driven largely by the groundbreaking performance of a small number of closed-source models These recent advances have their roots in over a decade of accumulating foundational research
  • 8. The legal cases… The harm Novelty (Across Many Dimensions) The good: Fun with content generation, trying to take over the world, fun generative agents. Also, does better on tests than you do…
  • 9. The legal cases… The harm: Hallucinations, disinformation, workforce impacts, misguided usage, environmental impacts Novelty (Across Many Dimensions) The good: Fun with content generation, trying to take over the world, fun generative agents. Also, does better on tests than you do…
  • 10. The legal cases… The harm: Hallucinations, disinformation, workforce impacts, misguided usage, environmental impacts Novelty (Across Many Dimensions) The good: Fun with content generation, trying to take over the world, fun generative agents. Also, does better on tests than you do…
  • 12. Advances in the technology have opened up a variety of novel use cases, and the hype has caused a massive shift in both expectations and allowances for model performance Setting the Scene Generative AI has taken the world by storm over the past year, driven largely by the groundbreaking performance of a small number of closed-source models These recent advances have their roots in over a decade of accumulating foundational research
  • 13. Foundational Roots of LLMs 2001 First Neural Language Model 2010 RNNs for Language Modeling 2013 Contextual Word Embeddings 2015 Attention Mechanism 2017 Transformer Architecture
  • 14. How does the impact of previous advances compare to what’s happening right now? Questions to Cover What previous advances have led us here? What have these changes meant for those in this field, and in the periphery of this field?
  • 15. Engineers and Architects Who: Software engineers, architects, devops and data engineers What: Core contributors required to mature technology beyond specialized startups to enterprise grade / scale Who’s Who in NLP Researchers and Practitioners Who: Machine learning researchers and engineers, data scientists, computational linguists What: Driving advances in tech and/or knowledgeable enough to immediately leverage Business Interests and Specialists Who: C-Suite members, enterprise technical leaders, founders and investors, and domain specialists (research) What: Recognizing maturity and leveraging advances in tech for relevant use cases
  • 16. Introduction Outline Setting the Scene Who’s who in NLP The Alex Net Moment Context is King Sequences Abound Attention is Everything Multi-mode, multi-task, multi-billion A New World An Historical Perspective
  • 17. The AlexNet Moment (+ history, 2001 - 2012) I’d argue that one of the most important moments leading to NLP having broader impacts outside of academic/specialist communities wasn’t an advancement in NLP at all… But first, a bit of history on NLP research…
  • 18. One of the foundational tasks for NLP is language modeling, which is simply predicting the next word in a sequence given the previous sequence ● In 2001, Bengio et. al. introduced an early neural model for next token prediction ● In 2010, Mikolov et. al. explored the application of RNNs for LM* Early Neural Methods for Language Modeling * Extensions such as the LSTM gained massive popularity in subsequent years
  • 19. Engineers and Architects Impact? Researchers and Practitioners Exploration of new techniques really pushing the boundaries of what’s possible, but… …often need to understand how to implement models from scratch with few standard libraries Business Interests and Specialists
  • 20. The AlexNet Moment (2012) In 2012, the AlexNet architecture won the ImageNet competition with a 10.5% reduction in top-5 error. This tangibly demonstrated the promise of neural networks in solving problems that could provide tangible business value The resurgent popularity of neural networks had broad implications beyond just the researchers and practitioners working in this space
  • 21. Engineers and Architects Huge opportunity to build out standard frameworks to support deep learning research and development Impact? Researchers and Practitioners Purpose-built neural net architectures have the potential to substantially outperform prior methods and should be more thoroughly explored for NLP use cases Business Interests and Specialists Early signals that neural nets can provide substantial business and research value, this is an important area for early investment
  • 22. Context Is King (2013) In 2013, Mikalov et. al. demonstrated word2vec, a technique that efficiently produced contextual word embeddings at scale, from a large, unlabeled corpus This work was followed in 2014 by the GloVe embedding method (Pennington et. al.) which leverages global co-occurrance statistics to generate contextual embeddings Both research groups made these sets of pre-trained word embeddings publicly available under the Apache 2.0 license
  • 23. Engineers and Architects New interesting opportunities and challenges for large-scale unstructured data sets. New packages emerging for generating and using publicly-available model artifacts Impact? Researchers and Practitioners Unsupervised pre-training has the potential to capture interesting semantic relationships without the need for expensive (and error prone) supervised human labeled data. Business Interests and Specialists My unstructured data has inherent value that can be extracted in an automated way. There’s a new ecosystem emerging of privately-funded research efforts producing and releasing valuable IP
  • 24. In 2014, Sutskever et. al. from the Google Brain team introduced a novel approach for leveraging neural nets to map sequences to sequences. This had major implications for neural machine translation among other tasks In the next year, Bahdanau et. al. published a novel approach to neural machine translation introducing the attention mechanism. Sequences Abound (2014 - 2015)
  • 25. Engineers and Architects Neural nets are starting to pop up in a variety of use cases. I should check out this new tensorflow thing Impact? Researchers and Practitioners We can now tackle seq2seq problems, and the attention mechanism shows great promise in letting a model architecture learn which context is relevant for word prediction Business Interests and Specialists Large commercial R&D investments are producing truly novel tech that’s driving a step change in capabilities. New business opportunities for startups, new investments for enterprises
  • 26. In 2017, a new work from Vaswani et. al. demonstrated that “Attention is All You Need,” extending the attention mechanism proposed several years earlier to construct stacked blocks of multi-headed attention, or transformers. In the next year, the promise of transformers was firmly established with the release of both the original BERT paper from the Google AI Language team as well as the original GPT paper from a team at OpenAI Attention is Everything (2017 - 2018)
  • 27. Blocks of (multi-head) self-attention are the key component of transformers encoder and decoder blocks, allowing the model to learn deep contextual representations of the input tokens (typically bidirectional) Attention is All You Need
  • 28. Bidirectional Encoder Representations from Transformers (BERT) demonstrated that transformer encoder-only models can be efficiently trained through a two step process of unsupervised pre-training followed by task-specific fine-tuning The masked-language-model (MLM) pretraining paradigm opened the door for leveraging massive textual corpora to produce extremely performant models, while fine-tuning allowed practitioners to directly benefit from the substantial investments of large industry research groups BERT and the Encoders
  • 29. The original GPT paper followed a similar “pretrain then fine-tune” approach using a decoder only transformer architecture. Pretraining was carried out using with an autoregressive language model objective, with a variety of tasks for subsequent fine-tuning. GPT
  • 30. Engineers and Architects Deep learning toolkits are becoming more broadly available, and non-specialists can now experiment and build useful machine learning systems Impact? Researchers and Practitioners Transformers are an incredibly robust tool for learning deep contextual representations for language tasks. We should explore everything we can for transformers and pre-training paradigms Business Interests and Specialists The trend in open-sourcing valuable IP is only accelerating, an ecosystem is rapidly developing for AI. What tooling needs to be built? What NLP use cases does my organization have?
  • 31. Multi-mode, Multi-task, Multi-billion (2019 - 2021) In the wake of the success of models such as BERT and GPT, research interests shifted to exploring, extending, and augmenting various paradigms introduced in these recent works, such as: ● Extensions to attention mechanism ● Encoder-decoder architectures ● Pre-training and fine-tuning paradigms ● Larger and smaller (more efficient) models ● Multi-modal applications
  • 32. The original attention mechanism is robust, but computationally expensive as sequence lengths grow. Models such as Longformer, Reformer, Performer, etc. explored various methods for extending the attention mechanism to longer sequence lengths. Extend Your Attention
  • 33. While encoder and decoder-only models such as BERT and GPT demonstrated great promise in their own right, extensive research efforts were focused on leveraging full encoder-decoder transformer architectures for sequence to sequence tasks such as translation, reading comprehension, summarization, etc. Models like BART, T5, Pegasus, and ULM all leveraged encoder-decoder models along with novel training paradigms to produce performant models across a variety of tasks Encode and Decode
  • 34. Extending the fine-tuning paradigms introduced in previous work, many groups shifted focus to using single architecture to perform a variety of disparate tasks Models like T5 probed the limits of multi-task transfer learning, while the family of GPT models evolved (GPT-2 and GPT-3) demonstrating the benefits of multi-task learning and highlighting emergent capabilities of larger models such as few-shot learning and generalization. “Fine-Tuning Language Models from Human Preferences” demonstrated that human feedback can play a key role in aligning generative model outputs with human expectations Task Variety, Instructions, and Alignment
  • 35. While some work sought to make transformer models smaller and more efficient (pruning, distillation, quantization), other works focused on scaling up to much larger models (and larger training datasets). Novel works explored new scaling laws for the era of large language models Bigger (and Smaller) Models
  • 36. A huge body of research also emerged around multi-modal applications of transformers (Xu et. al. for a recent survey) One particularly visible application of transformers applied to audio was the Whisper model released by OpenAI , which leveraged an relatively straightforward transformer and a huge volume of data to produce a performant speech to text model Multi-Modal Models
  • 37. Researchers and Practitioners Wow… Also, everything is a transformer. Engineers and Architects With models getting so big, there are novel challenges for ML training and inference. Access to public models allows us to incorporate ML into nearly any application with zero training Impact? Business Interests and Specialists I can build a new business on top of open source machine learning easier than ever before, and so can everyone else. How can I leverage our data and talent to apply this new tech to our business?
  • 38. In November, OpenAI announced the public release of ChatGPT, a large (and notably unspecified) generative model fine-tuned through reinforcement learning with human feedback (RLHF). For many, this system was the first glimpse into the technological advances of the past decade and the tangible utility these models can provide. The model demonstrated a step change in previously demonstrated generative language modeling capabilities, causing a noticeable shift in research interest and commercial applications of NLP A New World (2022 - Present)
  • 39. In early 2023, there were seemingly no competitive systems available to the public that could match the performance of this new model. Aside from the novelty of the system in information recall or content-generation use cases, the few-shot and zero-shot performance of the model is remarkable Coupled with restrictive terms of service for competitive commercial uses, a sort of moat had been established around this novel technology The Dominance of GPT
  • 40. As 2023 has progressed, competitive open-source models have been released monthly, if not weekly. Some key aspects can determine the utility of these models for business use cases: ● Commercial permissibility (and data usage ToS!) ● Training / tuning paradigm ● Code generation Competition Heats Up
  • 41. To support this new emerging ecosystem of LLMs, a whole host of new tools have emerged, and some existing solutions have found new life. Some prominent areas of novel tooling include: ● Chain creation and management ● Vector data stores ● LLM training and inference platforms ● LLM testing and monitoring suites ● Labeling platforms A New Tooling Ecosystem
  • 42. Along with the proliferation of new tools, new techniques that aid in working with these new models have gained in popularity. Parameter efficient fine-tuning (PEFT) techniques including LoRA, p-tuning, etc. have emerged as a viable path for domain/task adaptation on consumer hardware (along with quantization) New research has also emerged this year focusing on more flexible embedding techniques (ALiBi, RoPE) that allow for model training (and adaptation) to longer context windows Trends in Methodology, Both New…
  • 43. Along with the proliferation of new tools, many familiar tools and paradigms are being revisited and revitalized, including components of more traditional chatbots, information retrieval systems, and enterprise labeling platforms Trends in Methodology, Both New… … and Familiar
  • 44. A New Hammer (But Not for Every Nail) LLMs excel in a variety of generative use cases, such as conversational assistance, content generation (code, templates, etc.), and retrieval augmented generation. They also enable novel opportunities via autonomous agents. Although LLMs are capable for a wide variety of tasks, for discriminative use cases, when data is available at scale, and/or when factual accuracy is critical (without a human in the loop), smaller, more efficient models are often a better option
  • 45. Engineers and Architects We can build generative AI directly into our platforms (at a cost) without the typical ML R&D lifecycle, but it’s difficult to get traction beyond PoC stage Impact? Researchers and Practitioners The bulk of generative NLP use cases will likely be handled by LLMs, and we need to understand how to best utilize, maintain, and constrain these systems. BUT, genAI is not always the answer! Business Interests and Specialists Generative AI is in full-on disruption mode, and I need to figure out how to integrate it into our business as quickly as possible