SlideShare a Scribd company logo
Artificial Intelligence
Language Models

Maryam Khordad
The University of Western Ontario
1
Natural Language Processing
• The idea behind NLP: To give computers
the ability to process human language.
• To get computers to perform useful tasks
involving human language:
– Dialogue systems
• Cleverbot

– Machine translation
– Question answering (Search engines)

2
Natural Language Processing
• Knowledge of language.
• One of the important information about a
language: Is a string a valid member of a
language or not?
–
–
–
–

Word
Sentence
Phrase
…

3
Language Models
• Formal grammars (e.g. regular, context free)
give a hard “binary” model of the legal
sentences in a language.
• For NLP, a probabilistic model of a
language that gives a probability that a
string is a member of a language is more
useful.
• To specify a correct probability distribution,
the probability of all sentences in a
language must sum to 1.
4
Uses of Language Models
• Speech recognition
– “I ate a cherry” is a more likely sentence than “Eye eight
uh Jerry”

• OCR & Handwriting recognition
– More probable sentences are more likely correct readings.

• Machine translation
– More likely sentences are probably better translations.

• Generation
– More likely sentences are probably better NL generations.

• Context sensitive spelling correction
– “Their are problems wit this sentence.”
5
Completion Prediction
• A language model also supports predicting
the completion of a sentence.
– Please turn off your cell _____
– Your program does not ______

• Predictive text input systems can guess what
you are typing and give choices on how to
complete it.

6
N-Gram Word Models
• We can have n-gram models over sequences of words,
characters, syllables or other units.
• Estimate probability of each word given prior context.
– P(phone | Please turn off your cell)

• An N-gram model uses only N 1 words of prior context.
– Unigram: P(phone)
– Bigram: P(phone | cell)
– Trigram: P(phone | your cell)

• The Markov assumption is the presumption that the future
behavior of a dynamical system only depends on its recent
history. In particular, in a kth-order Markov model, the
next state only depends on the k most recent states,
therefore an N-gram model is a (N 1)-order Markov model.
7
N-Gram Model Formulas
• Word sequences
w1n

w1... wn

• Chain rule of probability
n
n
1

P( w )

2
1

n 1
1

P( w1 ) P( w2 | w1 ) P( w3 | w )... P( wn | w

• Bigram approximation

P( wk | w1k 1 )

)
k 1

n
n
1

P( w )

P ( wk | wk 1 )
k 1

• N-gram approximation
n
n
1

k
P( wk | wk

P( w )

1
N 1

)

k 1
8
Estimating Probabilities
• N-gram conditional probabilities can be estimated
from raw text based on the relative frequency of
word sequences.
Bigram:
N-gram:

C ( wn 1wn )
C ( wn 1 )

P ( wn | wn 1 )
n
P(wn | wn

1
N 1

)

n
C (wn 1 1wn )
N
n
C (wn 1 1 )
N

• To have a consistent probabilistic model, append a
unique start (<s>) and end (</s>) symbol to every
sentence and treat these as additional words.
9
N-gram character models
• One of the simplest language models: P(c1N )
• Language identification: given the text determine
which language it is written in.
• Build a trigram character model of each candidate
language: P(ci | ci 2:i 1 , l )
• We want to find the most probable language given
the Text:
l * argmax P(l | c1N )
l

argmax P(l ) P(c1N | l )
l

N

argmax P (l )
l

P(ci | ci
i 1

2:i 1

, l)
10
Train and Test Corpora
• We call a body of text a corpus (plural corpora).
• A language model must be trained on a large
corpus of text to estimate good parameter values.
• Model can be evaluated based on its ability to
predict a high probability for a disjoint (held-out)
test corpus (testing on the training corpus would
give an optimistically biased estimate).
• Ideally, the training (and test) corpus should be
representative of the actual application data.

11
Unknown Words
• How to handle words in the test corpus that
did not occur in the training data, i.e. out of
vocabulary (OOV) words?
• Train a model that includes an explicit
symbol for an unknown word (<UNK>).
– Choose a vocabulary in advance and replace
other words in the training corpus with
<UNK>.
– Replace the first occurrence of each word in the
training data with <UNK>.
12
Perplexity
• Measure of how well a model “fits” the test data.
• Uses the probability that the model assigns to the
test corpus.
• Normalizes for the number of words in the test
corpus and takes the inverse.
N

Perplexity(W1 )

N

1
P( w1w2 ...wN )

• Measures the weighted average branching factor
in predicting the next word (lower is better).

13
Sample Perplexity Evaluation
• Models trained on 38 million words from
the Wall Street Journal (WSJ) using a
19,979 word vocabulary.
• Evaluate on a disjoint set of 1.5 million
WSJ words.
Unigram
Perplexity
962

Bigram
170

Trigram
109

14
Smoothing
• Since there are a combinatorial number of possible
strings, many rare (but not impossible)
combinations never occur in training, so the
system incorrectly assigns zero to many
parameters.
• If a new combination occurs during testing, it is
given a probability of zero and the entire sequence
gets a probability of zero (i.e. infinite perplexity).
• Example:
– “--th” Very common
– “--ht” Uncommon but what if we see this sentence:
“This program issues an http request.”
15
Smoothing
• In practice, parameters are smoothed to reassign
some probability mass to unseen events.
– Adding probability mass to unseen events requires
removing it from seen ones (discounting) in order to
maintain a joint distribution that sums to 1.

16
Laplace (Add-One) Smoothing
• The simplest type of smoothing.
• In the lack of further information, if a random
Boolean variable X has been false in all n
observations so far then the estimate for P(X=true)
should be 1/(n+2).
• He assumes that with two more trials, one might be
false and one false.
• Performs relatively poorly.

17
Advanced Smoothing
• Many advanced techniques have been
developed to improve smoothing for
language models.
– Interpolation
– Backoff

18
Model Combination
• As N increases, the power (expressiveness)
of an N-gram model increases, but the
ability to estimate accurate parameters from
sparse data decreases (i.e. the smoothing
problem gets worse).
• A general approach is to combine the results
of multiple N-gram models of increasing
complexity (i.e. increasing N).

19
Interpolation
• Linearly combine estimates of N-gram models of
increasing order.
Interpolated Trigram Model:

ˆ
P(wn | wn 2, wn 1 )

3

Where:

(wn | wn 2, wn 1 )
i

2

P(wn | wn 1 )

1

P(wn )

1

i

•
•

i Can

be fixed or can be trained.
i Can depend on the counts: if we have a high
count of trigrams then we weigh them relatively
more otherwise put more weight on the bigram
and unigrams.
20
Backoff
• Only use lower-order model when data for higherorder model is unavailable (i.e. count is zero).
• Recursively back-off to weaker models until data
is available.

21
Summary
• Language models assign a probability that a
sentence is a legal string in a language.
• They are useful as a component of many NLP
systems, such as ASR, OCR, and MT.
• Simple N-gram models are easy to train on
corpora and can provide useful estimates of
sentence likelihood.
• N-gram gives inaccurate parameters for models
trained on sparse data.
• Smoothing techniques adjust parameter estimates
to account for unseen (but not impossible) events.
22
Questions

23
About this slide presentation
• These slides are developed using Raymond
J.Mooney’s slides from University of Texas at
Austin
(http://www.cs.utexas.edu/~mooney/cs388/) .

24

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Rishikese MR
 
Wordnet
WordnetWordnet
Wordnet
Govind Raj
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
gulshan kumar
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
Jenny Galino
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
VeenaSKumar2
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Hansi Thenuwara
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
Artificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingArtificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language Processing
Frank Cunha
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
Hemantha Kulathilake
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Abash shah
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & ReasoningSajid Marwat
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
Saurav Shrestha
 
Nlp
NlpNlp
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
prashantdahake
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
Rushdi Shams
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Wordnet
WordnetWordnet
Wordnet
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Artificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingArtificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language Processing
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 

Viewers also liked

Information Extraction
Information ExtractionInformation Extraction
Information Extraction
Rubén Izquierdo Beviá
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 
Cultural Model (Learning Literature)
Cultural Model (Learning Literature) Cultural Model (Learning Literature)
Cultural Model (Learning Literature)
Munirah Abd Latif
 
Cultural models
Cultural modelsCultural models
Cultural models
John Carter McKnight
 
Teaching culture through literature to EFL students
Teaching culture through literature to EFL studentsTeaching culture through literature to EFL students
Teaching culture through literature to EFL students
Wilmer Quiros
 
Chomsky's theories of-language-acquisition1-1225480010904742-8
Chomsky's theories of-language-acquisition1-1225480010904742-8Chomsky's theories of-language-acquisition1-1225480010904742-8
Chomsky's theories of-language-acquisition1-1225480010904742-8ottymcruz
 
Chomsky’s and skinner’s theory of language acquisition
Chomsky’s and skinner’s theory of language acquisitionChomsky’s and skinner’s theory of language acquisition
Chomsky’s and skinner’s theory of language acquisition
Nur Khalidah
 

Viewers also liked (7)

Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Cultural Model (Learning Literature)
Cultural Model (Learning Literature) Cultural Model (Learning Literature)
Cultural Model (Learning Literature)
 
Cultural models
Cultural modelsCultural models
Cultural models
 
Teaching culture through literature to EFL students
Teaching culture through literature to EFL studentsTeaching culture through literature to EFL students
Teaching culture through literature to EFL students
 
Chomsky's theories of-language-acquisition1-1225480010904742-8
Chomsky's theories of-language-acquisition1-1225480010904742-8Chomsky's theories of-language-acquisition1-1225480010904742-8
Chomsky's theories of-language-acquisition1-1225480010904742-8
 
Chomsky’s and skinner’s theory of language acquisition
Chomsky’s and skinner’s theory of language acquisitionChomsky’s and skinner’s theory of language acquisition
Chomsky’s and skinner’s theory of language acquisition
 

Similar to Language models

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
ykyog
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
ananth
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
RajMani28
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
nxmaosdh232
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Lviv Data Science Summer School
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
hunglq
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
ayaha osaki
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
TransQuest
TransQuestTransQuest
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
KALPANATCSE
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
Florian Leitner
 
Mcs 031
Mcs 031Mcs 031

Similar to Language models (20)

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
TransQuest
TransQuestTransQuest
TransQuest
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Language models

  • 1. Artificial Intelligence Language Models Maryam Khordad The University of Western Ontario 1
  • 2. Natural Language Processing • The idea behind NLP: To give computers the ability to process human language. • To get computers to perform useful tasks involving human language: – Dialogue systems • Cleverbot – Machine translation – Question answering (Search engines) 2
  • 3. Natural Language Processing • Knowledge of language. • One of the important information about a language: Is a string a valid member of a language or not? – – – – Word Sentence Phrase … 3
  • 4. Language Models • Formal grammars (e.g. regular, context free) give a hard “binary” model of the legal sentences in a language. • For NLP, a probabilistic model of a language that gives a probability that a string is a member of a language is more useful. • To specify a correct probability distribution, the probability of all sentences in a language must sum to 1. 4
  • 5. Uses of Language Models • Speech recognition – “I ate a cherry” is a more likely sentence than “Eye eight uh Jerry” • OCR & Handwriting recognition – More probable sentences are more likely correct readings. • Machine translation – More likely sentences are probably better translations. • Generation – More likely sentences are probably better NL generations. • Context sensitive spelling correction – “Their are problems wit this sentence.” 5
  • 6. Completion Prediction • A language model also supports predicting the completion of a sentence. – Please turn off your cell _____ – Your program does not ______ • Predictive text input systems can guess what you are typing and give choices on how to complete it. 6
  • 7. N-Gram Word Models • We can have n-gram models over sequences of words, characters, syllables or other units. • Estimate probability of each word given prior context. – P(phone | Please turn off your cell) • An N-gram model uses only N 1 words of prior context. – Unigram: P(phone) – Bigram: P(phone | cell) – Trigram: P(phone | your cell) • The Markov assumption is the presumption that the future behavior of a dynamical system only depends on its recent history. In particular, in a kth-order Markov model, the next state only depends on the k most recent states, therefore an N-gram model is a (N 1)-order Markov model. 7
  • 8. N-Gram Model Formulas • Word sequences w1n w1... wn • Chain rule of probability n n 1 P( w ) 2 1 n 1 1 P( w1 ) P( w2 | w1 ) P( w3 | w )... P( wn | w • Bigram approximation P( wk | w1k 1 ) ) k 1 n n 1 P( w ) P ( wk | wk 1 ) k 1 • N-gram approximation n n 1 k P( wk | wk P( w ) 1 N 1 ) k 1 8
  • 9. Estimating Probabilities • N-gram conditional probabilities can be estimated from raw text based on the relative frequency of word sequences. Bigram: N-gram: C ( wn 1wn ) C ( wn 1 ) P ( wn | wn 1 ) n P(wn | wn 1 N 1 ) n C (wn 1 1wn ) N n C (wn 1 1 ) N • To have a consistent probabilistic model, append a unique start (<s>) and end (</s>) symbol to every sentence and treat these as additional words. 9
  • 10. N-gram character models • One of the simplest language models: P(c1N ) • Language identification: given the text determine which language it is written in. • Build a trigram character model of each candidate language: P(ci | ci 2:i 1 , l ) • We want to find the most probable language given the Text: l * argmax P(l | c1N ) l argmax P(l ) P(c1N | l ) l N argmax P (l ) l P(ci | ci i 1 2:i 1 , l) 10
  • 11. Train and Test Corpora • We call a body of text a corpus (plural corpora). • A language model must be trained on a large corpus of text to estimate good parameter values. • Model can be evaluated based on its ability to predict a high probability for a disjoint (held-out) test corpus (testing on the training corpus would give an optimistically biased estimate). • Ideally, the training (and test) corpus should be representative of the actual application data. 11
  • 12. Unknown Words • How to handle words in the test corpus that did not occur in the training data, i.e. out of vocabulary (OOV) words? • Train a model that includes an explicit symbol for an unknown word (<UNK>). – Choose a vocabulary in advance and replace other words in the training corpus with <UNK>. – Replace the first occurrence of each word in the training data with <UNK>. 12
  • 13. Perplexity • Measure of how well a model “fits” the test data. • Uses the probability that the model assigns to the test corpus. • Normalizes for the number of words in the test corpus and takes the inverse. N Perplexity(W1 ) N 1 P( w1w2 ...wN ) • Measures the weighted average branching factor in predicting the next word (lower is better). 13
  • 14. Sample Perplexity Evaluation • Models trained on 38 million words from the Wall Street Journal (WSJ) using a 19,979 word vocabulary. • Evaluate on a disjoint set of 1.5 million WSJ words. Unigram Perplexity 962 Bigram 170 Trigram 109 14
  • 15. Smoothing • Since there are a combinatorial number of possible strings, many rare (but not impossible) combinations never occur in training, so the system incorrectly assigns zero to many parameters. • If a new combination occurs during testing, it is given a probability of zero and the entire sequence gets a probability of zero (i.e. infinite perplexity). • Example: – “--th” Very common – “--ht” Uncommon but what if we see this sentence: “This program issues an http request.” 15
  • 16. Smoothing • In practice, parameters are smoothed to reassign some probability mass to unseen events. – Adding probability mass to unseen events requires removing it from seen ones (discounting) in order to maintain a joint distribution that sums to 1. 16
  • 17. Laplace (Add-One) Smoothing • The simplest type of smoothing. • In the lack of further information, if a random Boolean variable X has been false in all n observations so far then the estimate for P(X=true) should be 1/(n+2). • He assumes that with two more trials, one might be false and one false. • Performs relatively poorly. 17
  • 18. Advanced Smoothing • Many advanced techniques have been developed to improve smoothing for language models. – Interpolation – Backoff 18
  • 19. Model Combination • As N increases, the power (expressiveness) of an N-gram model increases, but the ability to estimate accurate parameters from sparse data decreases (i.e. the smoothing problem gets worse). • A general approach is to combine the results of multiple N-gram models of increasing complexity (i.e. increasing N). 19
  • 20. Interpolation • Linearly combine estimates of N-gram models of increasing order. Interpolated Trigram Model: ˆ P(wn | wn 2, wn 1 ) 3 Where: (wn | wn 2, wn 1 ) i 2 P(wn | wn 1 ) 1 P(wn ) 1 i • • i Can be fixed or can be trained. i Can depend on the counts: if we have a high count of trigrams then we weigh them relatively more otherwise put more weight on the bigram and unigrams. 20
  • 21. Backoff • Only use lower-order model when data for higherorder model is unavailable (i.e. count is zero). • Recursively back-off to weaker models until data is available. 21
  • 22. Summary • Language models assign a probability that a sentence is a legal string in a language. • They are useful as a component of many NLP systems, such as ASR, OCR, and MT. • Simple N-gram models are easy to train on corpora and can provide useful estimates of sentence likelihood. • N-gram gives inaccurate parameters for models trained on sparse data. • Smoothing techniques adjust parameter estimates to account for unseen (but not impossible) events. 22
  • 24. About this slide presentation • These slides are developed using Raymond J.Mooney’s slides from University of Texas at Austin (http://www.cs.utexas.edu/~mooney/cs388/) . 24

Editor's Notes

  1. What distinguishes language processing applications from other data processing systems is their use of knowledge of language.