SlideShare a Scribd company logo
Smoothing N-gram Models
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Smoothing
• What do we do with words that are in our vocabulary (they are not
unknown words) but appear in a test set in an unseen context (for
example they appear after a word they never appeared after in
training)?
• To keep a language model from assigning zero probability to these
unseen events, we’ll have to shave off a bit of probability mass from
some more frequent events and give it to the events we’ve never
seen.
• This modification is called smoothing or discounting.
• There are variety of ways to do smoothing:
– Add-1 smoothing
– Add-k smoothing
– Good-Turing Discounting
– Stupid backoff
– Kneser-Ney smoothing and many more
Laplace Smoothing / Add 1 Smoothing
• The simplest way to do smoothing is to add one to all
the bigram counts, before we normalize them into
probabilities.
• All the counts that used to be zero will now have a
count of 1, the counts of 1 will be 2, and so on.
• This algorithm is called Laplace smoothing.
• Laplace smoothing does not perform well enough to be
used in modern N-gram models, but it usefully
introduces many of the concepts that we see in other
smoothing algorithms, gives a useful baseline, and is
also a practical smoothing algorithm for other tasks like
text classification.
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Let’s start with the application of Laplace smoothing to
unigram probabilities.
• Recall that the unsmoothed maximum likelihood estimate
of the unigram probability of the word wi is its count ci
normalized by the total number of word tokens N:
𝑃 𝑊𝑖 =
𝐶𝑖
𝑁
• Laplace smoothing merely adds one to each count.
• Since there are V words in the vocabulary and each one
was incremented, we also need to adjust the denominator
to take into account the extra V observations.
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 𝑊𝑖 =
𝐶𝑖 + 1
𝑁 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Instead of changing both the numerator and
denominator, it is convenient to describe how a
smoothing algorithm affects the numerator, by defining
an adjusted count C*.
• This adjusted count is easier to compare directly with
the MLE counts and can be turned into a probability
like an MLE count by normalizing by N.
• To define this count, since we are only changing the
numerator in addition to adding 1 we’ll also need to
multiply by a normalization factor
𝑁
𝑁+𝑉
:
𝐶𝑖
∗
= (𝐶𝑖 + 1)
𝑁
𝑁 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• A related way to view smoothing is as discounting
(lowering) some non-zero counts in order to get
the probability mass that will be assigned to the
zero counts.
• Thus, instead of referring to the discounted
counts c, we might describe a smoothing
algorithm in terms of a relative discount dc, the
ratio of the discounted counts to the original
counts:
𝑑 𝑐 =
𝐶∗
𝐶
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• let’s smooth our Berkeley Restaurant Project
bigrams.
• Let’s take the Berkeley Restaurant project
bigrams;
Laplace Smoothing / Add 1 Smoothing
(Cont…)
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Recall that normal bigram probabilities are computed by
normalizing each row of counts by the unigram count:
𝑃 𝑊𝑛 𝑊𝑛−1 =
𝐶(𝑊𝑛−1 𝑊𝑛)
𝐶(𝑊𝑛−1)
• For add-one smoothed bigram counts, we need to augment the
unigram count by the number of total word types in the vocabulary
V:
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒
∗
𝑊𝑛 𝑊𝑛−1 =
𝐶 𝑊𝑛−1 𝑊𝑛 + 1
𝐶 𝑊𝑛−1 + 𝑉
• Thus, each of the unigram counts given in the previous table will
need to be augmented by V =1446.
• Ex:𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒
∗
𝑡𝑜 𝐼 =
𝐶 𝐼,𝑡𝑜 +1
𝐶 𝐼 +𝑉
=
0+1
2500+1446
=
1
3946
= 0.000253
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Following table shows the add-one smoothed
probabilities for the bigrams:
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• It is often convenient to reconstruct the count
matrix so we can see how much a smoothing
algorithm has changed the original counts.
• These adjusted counts can be computed by
using following equation:
𝐶∗
(𝑊𝑛−1 𝑊𝑛 ) =
[𝐶 𝑊𝑛−1 𝑊𝑛 ] × 𝐶(𝑊𝑛−1)
𝐶 𝑊𝑛−1 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Following table shows the reconstructed
counts:
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Note that add-one smoothing has made a very big change to the
counts.
• C(want to) changed from 608 to 238!
• We can see this in probability space as well: P(to|want) decreases
from .66 in the unsmoothed case to .26 in the smoothed case.
• Looking at the discount d (the ratio between new and old counts)
shows us how strikingly the counts for each prefix word have been
reduced; the discount for the bigram want to is .39, while the
discount for Chinese food is .10, a factor of 10!
• The sharp change in counts and probabilities occurs because too
much probability mass is moved to all the zeros.
Add-K Smoothing
• One alternative to add-one smoothing is to move a bit
less of the probability mass from the seen to the
unseen events.
• Instead of adding 1 to each count, we add a fractional
count k (.5? .05? .01?).
• This algorithm is therefore called add-k smoothing.
𝑃𝐴𝑑𝑑−𝑘
∗
𝑊𝑛 𝑊𝑛−1 =
𝐶 𝑊𝑛−1 𝑊𝑛 + 𝑘
𝐶 𝑊𝑛−1 + 𝑘𝑉
• Add-k smoothing requires that we have a method for
choosing k; this can be done, for example, by
optimizing on a development set (devset).
Good-Turing Discounting
• The basic insight of Good-Turing smoothing is to re-estimate the amount of
probability mass to assign to N-grams with zero or low counts by looking at the
number of N-grams with higher counts.
• In other words, we examine Nc, the number of N-grams that occur c times.
• We refer to the number of N-grams that occur c times as the frequency of
frequency c.
• So applying the idea to smoothing the joint probability of bigrams, N0 is the
number of bigrams b of count 0, N1 the number of bigrams with count 1, and so
on:
𝑁𝐶 =
𝑏:𝑐 𝑏 =𝑐
1
• The Good-Turing estimate gives a smoothed count c* based on the set of Nc for all
c, as follows:
𝐶∗ = (𝐶 + 1)
𝑁𝐶 + 1
𝑁𝐶
https://www.youtube.com/watch?v=z1bq4C8hFEk
Backoff and Interpolation
• In the backoff model, like the deleted interpolation model,
we build an N-gram model based on an (N-1)-gram model.
• The difference is that in backoff, if we have non-zero
trigram counts, we rely solely on the trigram counts and
don’t interpolate the bigram and unigram counts at all.
• We only ‘back off’ to a lower-order N-gram if we have zero
evidence for a higher-order N-gram.
Reference
• https://www.youtube.com/watch?v=z1bq4C8
hFEk

More Related Content

What's hot

Wordnet
WordnetWordnet
Wordnet
Govind Raj
 
Lecture 3 general problem solver
Lecture 3 general problem solverLecture 3 general problem solver
Lecture 3 general problem solver
Student at University Of Malakand, Pakistan
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
DataminingTools Inc
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
kartikaVashisht
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
MdFazleRabbi18
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
Hemantha Kulathilake
 
Nlp
NlpNlp
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
Hemantha Kulathilake
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
Vivek Bhargav
 
Flat unit 3
Flat unit 3Flat unit 3
Flat unit 3
VenkataRaoS1
 
Symbolic Mathematics
Symbolic Mathematics Symbolic Mathematics
Symbolic Mathematics
saadurrehman35
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
Hemantha Kulathilake
 
Learning in AI
Learning in AILearning in AI
Learning in AI
Minakshi Atre
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
Jenny Galino
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
Sudip Singh
 
Context free grammar
Context free grammarContext free grammar
Context free grammar
Radhakrishnan Chinnusamy
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
Rushdi Shams
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
Amruth Veerabhadraiah
 

What's hot (20)

Wordnet
WordnetWordnet
Wordnet
 
Lecture 3 general problem solver
Lecture 3 general problem solverLecture 3 general problem solver
Lecture 3 general problem solver
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Nlp
NlpNlp
Nlp
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Flat unit 3
Flat unit 3Flat unit 3
Flat unit 3
 
Symbolic Mathematics
Symbolic Mathematics Symbolic Mathematics
Symbolic Mathematics
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
Context free grammar
Context free grammarContext free grammar
Context free grammar
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 

Similar to NLP_KASHK:Smoothing N-gram Models

Numerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value ProblemsNumerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value Problems
Gobinda Debnath
 
Mechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectorsMechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectors
ZondeenAlleyne
 
L06
L06L06
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Mathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptxMathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptx
PKLI University- Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
fashiontrendzz20
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
Krishnan MuthuManickam
 
Calculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptxCalculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptx
Soleil50
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
SantiagoGarridoBulln
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
Dan Wellisch
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
Optimization techniq
Optimization techniqOptimization techniq
Optimization techniq
RakshithGowdakodihal
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Maninda Edirisooriya
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
ABINASHPADHY6
 
Number theory and cryptography
Number theory and cryptographyNumber theory and cryptography
Number theory and cryptography
Yasser Ali
 
Linear Programing.pptx
Linear Programing.pptxLinear Programing.pptx
Linear Programing.pptx
AdnanHaleem
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Computing Transformations Spring2005
Computing Transformations Spring2005Computing Transformations Spring2005
Computing Transformations Spring2005guest5989655
 
Computingtransformations Spring2005
Computingtransformations Spring2005Computingtransformations Spring2005
Computingtransformations Spring20051Leu
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx
shyedshahriar
 

Similar to NLP_KASHK:Smoothing N-gram Models (20)

Numerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value ProblemsNumerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value Problems
 
Mechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectorsMechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectors
 
L06
L06L06
L06
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Mathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptxMathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptx
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
 
Calculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptxCalculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptx
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Optimization techniq
Optimization techniqOptimization techniq
Optimization techniq
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Number theory and cryptography
Number theory and cryptographyNumber theory and cryptography
Number theory and cryptography
 
Linear Programing.pptx
Linear Programing.pptxLinear Programing.pptx
Linear Programing.pptx
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Computing Transformations Spring2005
Computing Transformations Spring2005Computing Transformations Spring2005
Computing Transformations Spring2005
 
Computingtransformations Spring2005
Computingtransformations Spring2005Computingtransformations Spring2005
Computingtransformations Spring2005
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx
 

More from Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
Hemantha Kulathilake
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
Hemantha Kulathilake
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
Hemantha Kulathilake
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
Hemantha Kulathilake
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
Hemantha Kulathilake
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
Hemantha Kulathilake
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
Hemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
Hemantha Kulathilake
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
Hemantha Kulathilake
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
Hemantha Kulathilake
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
Hemantha Kulathilake
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
Hemantha Kulathilake
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
Hemantha Kulathilake
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
Hemantha Kulathilake
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
Hemantha Kulathilake
 
COM1407: Structured Program Development
COM1407: Structured Program Development COM1407: Structured Program Development
COM1407: Structured Program Development
Hemantha Kulathilake
 
Segmentation Techniques -II
Segmentation Techniques -IISegmentation Techniques -II
Segmentation Techniques -II
Hemantha Kulathilake
 
Segmentation Techniques -I
Segmentation Techniques -ISegmentation Techniques -I
Segmentation Techniques -I
Hemantha Kulathilake
 

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
 
COM1407: Structured Program Development
COM1407: Structured Program Development COM1407: Structured Program Development
COM1407: Structured Program Development
 
Segmentation Techniques -II
Segmentation Techniques -IISegmentation Techniques -II
Segmentation Techniques -II
 
Segmentation Techniques -I
Segmentation Techniques -ISegmentation Techniques -I
Segmentation Techniques -I
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

NLP_KASHK:Smoothing N-gram Models

  • 1. Smoothing N-gram Models K.A.S.H. Kulathilake B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
  • 2. Smoothing • What do we do with words that are in our vocabulary (they are not unknown words) but appear in a test set in an unseen context (for example they appear after a word they never appeared after in training)? • To keep a language model from assigning zero probability to these unseen events, we’ll have to shave off a bit of probability mass from some more frequent events and give it to the events we’ve never seen. • This modification is called smoothing or discounting. • There are variety of ways to do smoothing: – Add-1 smoothing – Add-k smoothing – Good-Turing Discounting – Stupid backoff – Kneser-Ney smoothing and many more
  • 3. Laplace Smoothing / Add 1 Smoothing • The simplest way to do smoothing is to add one to all the bigram counts, before we normalize them into probabilities. • All the counts that used to be zero will now have a count of 1, the counts of 1 will be 2, and so on. • This algorithm is called Laplace smoothing. • Laplace smoothing does not perform well enough to be used in modern N-gram models, but it usefully introduces many of the concepts that we see in other smoothing algorithms, gives a useful baseline, and is also a practical smoothing algorithm for other tasks like text classification.
  • 4. Laplace Smoothing / Add 1 Smoothing (Cont…) • Let’s start with the application of Laplace smoothing to unigram probabilities. • Recall that the unsmoothed maximum likelihood estimate of the unigram probability of the word wi is its count ci normalized by the total number of word tokens N: 𝑃 𝑊𝑖 = 𝐶𝑖 𝑁 • Laplace smoothing merely adds one to each count. • Since there are V words in the vocabulary and each one was incremented, we also need to adjust the denominator to take into account the extra V observations. 𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 𝑊𝑖 = 𝐶𝑖 + 1 𝑁 + 𝑉
  • 5. Laplace Smoothing / Add 1 Smoothing (Cont…) • Instead of changing both the numerator and denominator, it is convenient to describe how a smoothing algorithm affects the numerator, by defining an adjusted count C*. • This adjusted count is easier to compare directly with the MLE counts and can be turned into a probability like an MLE count by normalizing by N. • To define this count, since we are only changing the numerator in addition to adding 1 we’ll also need to multiply by a normalization factor 𝑁 𝑁+𝑉 : 𝐶𝑖 ∗ = (𝐶𝑖 + 1) 𝑁 𝑁 + 𝑉
  • 6. Laplace Smoothing / Add 1 Smoothing (Cont…) • A related way to view smoothing is as discounting (lowering) some non-zero counts in order to get the probability mass that will be assigned to the zero counts. • Thus, instead of referring to the discounted counts c, we might describe a smoothing algorithm in terms of a relative discount dc, the ratio of the discounted counts to the original counts: 𝑑 𝑐 = 𝐶∗ 𝐶
  • 7. Laplace Smoothing / Add 1 Smoothing (Cont…) • let’s smooth our Berkeley Restaurant Project bigrams. • Let’s take the Berkeley Restaurant project bigrams;
  • 8. Laplace Smoothing / Add 1 Smoothing (Cont…)
  • 9. Laplace Smoothing / Add 1 Smoothing (Cont…) • Recall that normal bigram probabilities are computed by normalizing each row of counts by the unigram count: 𝑃 𝑊𝑛 𝑊𝑛−1 = 𝐶(𝑊𝑛−1 𝑊𝑛) 𝐶(𝑊𝑛−1) • For add-one smoothed bigram counts, we need to augment the unigram count by the number of total word types in the vocabulary V: 𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 ∗ 𝑊𝑛 𝑊𝑛−1 = 𝐶 𝑊𝑛−1 𝑊𝑛 + 1 𝐶 𝑊𝑛−1 + 𝑉 • Thus, each of the unigram counts given in the previous table will need to be augmented by V =1446. • Ex:𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 ∗ 𝑡𝑜 𝐼 = 𝐶 𝐼,𝑡𝑜 +1 𝐶 𝐼 +𝑉 = 0+1 2500+1446 = 1 3946 = 0.000253
  • 10. Laplace Smoothing / Add 1 Smoothing (Cont…) • Following table shows the add-one smoothed probabilities for the bigrams:
  • 11. Laplace Smoothing / Add 1 Smoothing (Cont…) • It is often convenient to reconstruct the count matrix so we can see how much a smoothing algorithm has changed the original counts. • These adjusted counts can be computed by using following equation: 𝐶∗ (𝑊𝑛−1 𝑊𝑛 ) = [𝐶 𝑊𝑛−1 𝑊𝑛 ] × 𝐶(𝑊𝑛−1) 𝐶 𝑊𝑛−1 + 𝑉
  • 12. Laplace Smoothing / Add 1 Smoothing (Cont…) • Following table shows the reconstructed counts:
  • 13. Laplace Smoothing / Add 1 Smoothing (Cont…) • Note that add-one smoothing has made a very big change to the counts. • C(want to) changed from 608 to 238! • We can see this in probability space as well: P(to|want) decreases from .66 in the unsmoothed case to .26 in the smoothed case. • Looking at the discount d (the ratio between new and old counts) shows us how strikingly the counts for each prefix word have been reduced; the discount for the bigram want to is .39, while the discount for Chinese food is .10, a factor of 10! • The sharp change in counts and probabilities occurs because too much probability mass is moved to all the zeros.
  • 14. Add-K Smoothing • One alternative to add-one smoothing is to move a bit less of the probability mass from the seen to the unseen events. • Instead of adding 1 to each count, we add a fractional count k (.5? .05? .01?). • This algorithm is therefore called add-k smoothing. 𝑃𝐴𝑑𝑑−𝑘 ∗ 𝑊𝑛 𝑊𝑛−1 = 𝐶 𝑊𝑛−1 𝑊𝑛 + 𝑘 𝐶 𝑊𝑛−1 + 𝑘𝑉 • Add-k smoothing requires that we have a method for choosing k; this can be done, for example, by optimizing on a development set (devset).
  • 15. Good-Turing Discounting • The basic insight of Good-Turing smoothing is to re-estimate the amount of probability mass to assign to N-grams with zero or low counts by looking at the number of N-grams with higher counts. • In other words, we examine Nc, the number of N-grams that occur c times. • We refer to the number of N-grams that occur c times as the frequency of frequency c. • So applying the idea to smoothing the joint probability of bigrams, N0 is the number of bigrams b of count 0, N1 the number of bigrams with count 1, and so on: 𝑁𝐶 = 𝑏:𝑐 𝑏 =𝑐 1 • The Good-Turing estimate gives a smoothed count c* based on the set of Nc for all c, as follows: 𝐶∗ = (𝐶 + 1) 𝑁𝐶 + 1 𝑁𝐶 https://www.youtube.com/watch?v=z1bq4C8hFEk
  • 16. Backoff and Interpolation • In the backoff model, like the deleted interpolation model, we build an N-gram model based on an (N-1)-gram model. • The difference is that in backoff, if we have non-zero trigram counts, we rely solely on the trigram counts and don’t interpolate the bigram and unigram counts at all. • We only ‘back off’ to a lower-order N-gram if we have zero evidence for a higher-order N-gram.