SlideShare a Scribd company logo
Evaluating Language Models
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Extrinsic Evaluation
• The best way to evaluate the performance of a language
model is to embed it in an application and measure how
much the application improves.
• Such end-to-end evaluation is called extrinsic evaluation.
• Extrinsic evaluation is the only way to evaluation know if a
particular improvement in a component is really going to
help the task at hand.
• Thus, for speech recognition, we can compare the
performance of two language models by running the
speech recognizer twice, once with each language model,
and seeing which gives the more accurate transcription.
Intrinsic Evaluation
• Unfortunately, running big NLP systems end-
to-end is often very expensive.
• Instead, it would be nice to have a metric that
can be used to quickly evaluate potential
improvements in a language model.
• An intrinsic evaluation metric is one that
measures the quality of a model independent
of any application.
Intrinsic Evaluation (Cont…)
• For an intrinsic evaluation of a language model we need a test set.
• The probabilities of an N-gram model training set come from the
corpus it is trained on, the training set or training corpus.
• We can then measure the quality of an N-gram model by its
performance on some unseen test set data called the test set or
test corpus.
• We will also sometimes call test sets and other datasets that are
not in our training sets held out corpora because we hold them out
from the training data.
• So if we are given a corpus of text and want to compare two
different N-gram models, we divide the data into training and test
sets, train the parameters of both models on the training set, and
then compare how well the two trained models fit the test set.
Intrinsic Evaluation (Cont…)
• But what does it mean to “fit the test set”?
– Whichever model assigns a higher probability to
the test set—meaning it more accurately predicts
the test set—is a better model.
• Given two probabilistic models, the better
model is the one that has a tighter fit to the
test data or that better predicts the details of
the test data, and hence will assign a higher
probability to the test data.
Intrinsic Evaluation (Cont…)
• Since our evaluation metric is based on test set probability,
it’s important not to let the test sentences into the training
set.
• Suppose we are trying to compute the probability of a
particular “test” sentence.
• If our test sentence is part of the training corpus, we will
mistakenly assign it an artificially high probability when it
occurs in the test set.
• We call this situation training on the test set.
• Training on the test set introduces a bias that makes the
probabilities all look too high, and causes huge inaccuracies
in perplexity ( perplexity means the probability-based
metric).
Development Test
• Sometimes we use a particular test set so often that we implicitly tune to its
characteristics.
• We then need a fresh test set that is truly unseen.
• In such cases, we call the initial test set the development test set or, devset.
• How do we divide our data into training, development, and test sets?
• We want our test set to be as large as possible, since a small test set may be
accidentally unrepresentative, but we also want as much training data as possible.
• At the minimum, we would want to pick the smallest test set that gives us enough
statistical power to measure a statistically significant difference between two
potential models.
• In practice, we often just divide our data into 80% training, 10% development, and
10% test.
• Given a large corpus that we want to divide into training and test, test data can
either be taken from some continuous sequence of text inside the corpus, or we
can remove smaller “stripes” of text from randomly selected parts of our corpus
and combine them into a test set.
Perplexity
• In practice we don’t use raw probability as our
metric for evaluating language models, but a
variant called perplexity.
• The perplexity (sometimes called PP for short) of
a language model on a test set is the inverse
probability of the test set, normalized by the
number of words. For a test setW = w1, w2, ……,
wN:
Perplexity (Cont…)
• We can use the chain rule to expand the
probability of W:
• Thus, if we are computing the perplexity of W
with a bigram language model, we get:
Perplexity (Cont…)
• Note that because of the inverse in previous equations, the
higher the conditional probability of the word sequence,
the lower the perplexity.
• Thus, minimizing perplexity is equivalent to maximizing the
test set probability according to the language model.
• What we generally use for word sequence in those
equations is the entire sequence of words in some test set.
• Since this sequence will cross many sentence boundaries,
we need to include the begin- and end-sentence markers
<s> and </s> in the probability computation.
• We also need to include the end-of-sentence marker </s>
(but not the beginning-of-sentence marker <s>) in the total
count of word tokens N.
Perplexity (Cont…)
• There is another way to think about perplexity: as the weighted
average branching factor of a language.
• The branching factor of a language is the number of possible next
words that can follow any word.
• Consider the task of recognizing the digits in English (zero, one,
two,..., nine), given that each of the 10 digits occurs with equal
probability P = 1/10.
• The perplexity of this mini-language is in fact 10.
• To see that, imagine a string of digits of length N.
Perplexity for Comparing Different N-
gram Models
• We trained unigram, bigram, and trigram grammars on 38
million words (including start-of-sentence tokens) from the
Wall Street Journal, using a 19,979 word vocabulary.
• We then computed the perplexity of each of these models
on a test set of 1.5 million words with following equation.
• The table below shows the perplexity of a 1.5 million word
WSJ test set according to each of these grammars.
Perplexity for Comparing Different N-
gram Models (Cont…)
• As we see above, the more information the N-
gram gives us about the word sequence, the
lower the perplexity.
• Note that in computing perplexities, the N-gram
model P must be constructed without any
knowledge of the test set or any prior knowledge
of the vocabulary of the test set.
• Any kind of knowledge of the test set can cause
the perplexity to be artificially low.
• The perplexity of two language models is only
comparable if they use identical vocabularies.
Generalization and Zeros
• The statistical models are likely to be pretty
useless as predictors if the training sets and the
test sets are as different.
• How should we deal with this problem when we
build N-gram models?
• One way is to be sure to use a training corpus
that has a similar genre to whatever task we are
trying to accomplish.
• To build a language model for translating legal
documents, we need a training corpus of legal
documents.
Generalization and Zeros (Cont…)
• Matching genres is still not sufficient.
• Our models may still be subject to the problem of sparsity.
• For any N-gram that occurred a sufficient number of times,
we might have a good estimate of its probability.
• But because any corpus is limited, some perfectly
acceptable English word sequences are bound to be
missing from it.
• That is, we’ll have many cases of putative “zero probability
N-grams” that should really have some non-zero
probability.
• Consider the words that follow the bigram denied the in
the WSJ Treebank3 corpus, together with their counts:
Generalization and Zeros (Cont…)
• To build a language model for a question-
answering system, we need a training corpus
of questions.
The Zeros
• Zeros— things that don’t ever occur in the training set
but do occur in the test set—are a problem for two
reasons.
– First, their presence means we are underestimating the
probability of all sorts of words that might occur, which will
hurt the performance of any application we want to run on
this data.
– Second, if the probability of any word in the test set is 0,
the entire probability of the test set is 0.
• By definition, perplexity is based on the inverse probability of
the test set.
• Thus if some words have zero probability, we can’t compute
perplexity at all, since we can’t divide by 0!
Unknown Words
• The previous section discussed the problem of words
whose bigram probability is zero.
• But what about words we simply have never seen before?
• Closed Vocabulary
– Sometimes we have a language task in which this can’t happen
because we know all the words that can occur.
– In such a closed vocabulary system the test set can only contain
words from this lexicon, and there will be no unknown words.
– This is a reasonable assumption in some domains, such as
speech recognition or machine translation, where we have a
pronunciation dictionary or a phrase table that are fixed in
advance, and so the language model can only use the words in
that dictionary or phrase table.
Unknown Words (Cont…)
• Open Vocabulary
– In other cases we have to deal with words we
haven’t seen before, which we’ll call unknown
words, or out of vocabulary (OOV) words.
– The percentage of OOV words that appear in the
test set is called the OOV rate.
– An open vocabulary system is one in which we
model these potential unknown words in the test
set by adding a pseudo-word called <UNK>.
Train the Probabilities of Unknown
Words
• There are two common ways to train the
probabilities of the unknown word model <UNK>.
• 1st Method:
– Turn the problem back into a closed vocabulary one
by choosing a fixed vocabulary in advance:
– Convert in the training set any word that is not in this
set (any OOV word) to the unknown word token
<UNK> in a text normalization step.
– Estimate the probabilities for <UNK> from its counts
just like any other regular word in the training set.
Train the Probabilities of Unknown
Words (Cont…)
• 2nd Method
– The second alternative, in situations where we don’t
have a prior vocabulary in advance, is to create such a
vocabulary implicitly, replacing words in the training
data by <UNK> based on their frequency.
– For example we can replace by <UNK> all words that
occur fewer than n times in the training set, where n is
some small number, or equivalently select a
vocabulary size V in advance (say 50,000) and choose
the top V words by frequency and replace the rest by
UNK.
– In either case we then proceed to train the language
model as before, treating <UNK> like a regular word.
Train the Probabilities of Unknown
Words (Cont…)
• The exact choice of <UNK> model does
have an effect on metrics like perplexity.
• A language model can achieve low perplexity
by choosing a small vocabulary and assigning
the unknown word a high probability.
• For this reason, perplexities should only be
compared across language models with the
same vocabularies.

More Related Content

What's hot

Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
Ibrahim Muneer
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
Kuppusamy P
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Daa notes 3
Daa notes 3Daa notes 3
Daa notes 3
smruti sarangi
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
Jenny Galino
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
Hemantha Kulathilake
 
Nlp
NlpNlp
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
Wordnet
WordnetWordnet
Wordnet
Govind Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
Marina Santini
 
Learning in AI
Learning in AILearning in AI
Learning in AI
Minakshi Atre
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
Deptii Chaudhari
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 

What's hot (20)

Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Language models
Language modelsLanguage models
Language models
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Daa notes 3
Daa notes 3Daa notes 3
Daa notes 3
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Nlp
NlpNlp
Nlp
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Wordnet
WordnetWordnet
Wordnet
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar to NLP_KASHK:Evaluating Language Model

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
Anuj Gupta
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
Anuj Gupta
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
NameetDaga1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
ShivangiYadav42
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
JOBANPREETSINGH62
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
Jinho Choi
 

Similar to NLP_KASHK:Evaluating Language Model (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Word embedding
Word embedding Word embedding
Word embedding
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 

More from Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
Hemantha Kulathilake
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
Hemantha Kulathilake
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
Hemantha Kulathilake
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
Hemantha Kulathilake
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
Hemantha Kulathilake
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
Hemantha Kulathilake
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
Hemantha Kulathilake
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
Hemantha Kulathilake
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
Hemantha Kulathilake
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
Hemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
Hemantha Kulathilake
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
Hemantha Kulathilake
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
Hemantha Kulathilake
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
Hemantha Kulathilake
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
Hemantha Kulathilake
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
Hemantha Kulathilake
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
Hemantha Kulathilake
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
Hemantha Kulathilake
 

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
 

Recently uploaded

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 

Recently uploaded (20)

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 

NLP_KASHK:Evaluating Language Model

  • 1. Evaluating Language Models K.A.S.H. Kulathilake B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
  • 2. Extrinsic Evaluation • The best way to evaluate the performance of a language model is to embed it in an application and measure how much the application improves. • Such end-to-end evaluation is called extrinsic evaluation. • Extrinsic evaluation is the only way to evaluation know if a particular improvement in a component is really going to help the task at hand. • Thus, for speech recognition, we can compare the performance of two language models by running the speech recognizer twice, once with each language model, and seeing which gives the more accurate transcription.
  • 3. Intrinsic Evaluation • Unfortunately, running big NLP systems end- to-end is often very expensive. • Instead, it would be nice to have a metric that can be used to quickly evaluate potential improvements in a language model. • An intrinsic evaluation metric is one that measures the quality of a model independent of any application.
  • 4. Intrinsic Evaluation (Cont…) • For an intrinsic evaluation of a language model we need a test set. • The probabilities of an N-gram model training set come from the corpus it is trained on, the training set or training corpus. • We can then measure the quality of an N-gram model by its performance on some unseen test set data called the test set or test corpus. • We will also sometimes call test sets and other datasets that are not in our training sets held out corpora because we hold them out from the training data. • So if we are given a corpus of text and want to compare two different N-gram models, we divide the data into training and test sets, train the parameters of both models on the training set, and then compare how well the two trained models fit the test set.
  • 5. Intrinsic Evaluation (Cont…) • But what does it mean to “fit the test set”? – Whichever model assigns a higher probability to the test set—meaning it more accurately predicts the test set—is a better model. • Given two probabilistic models, the better model is the one that has a tighter fit to the test data or that better predicts the details of the test data, and hence will assign a higher probability to the test data.
  • 6. Intrinsic Evaluation (Cont…) • Since our evaluation metric is based on test set probability, it’s important not to let the test sentences into the training set. • Suppose we are trying to compute the probability of a particular “test” sentence. • If our test sentence is part of the training corpus, we will mistakenly assign it an artificially high probability when it occurs in the test set. • We call this situation training on the test set. • Training on the test set introduces a bias that makes the probabilities all look too high, and causes huge inaccuracies in perplexity ( perplexity means the probability-based metric).
  • 7. Development Test • Sometimes we use a particular test set so often that we implicitly tune to its characteristics. • We then need a fresh test set that is truly unseen. • In such cases, we call the initial test set the development test set or, devset. • How do we divide our data into training, development, and test sets? • We want our test set to be as large as possible, since a small test set may be accidentally unrepresentative, but we also want as much training data as possible. • At the minimum, we would want to pick the smallest test set that gives us enough statistical power to measure a statistically significant difference between two potential models. • In practice, we often just divide our data into 80% training, 10% development, and 10% test. • Given a large corpus that we want to divide into training and test, test data can either be taken from some continuous sequence of text inside the corpus, or we can remove smaller “stripes” of text from randomly selected parts of our corpus and combine them into a test set.
  • 8. Perplexity • In practice we don’t use raw probability as our metric for evaluating language models, but a variant called perplexity. • The perplexity (sometimes called PP for short) of a language model on a test set is the inverse probability of the test set, normalized by the number of words. For a test setW = w1, w2, ……, wN:
  • 9. Perplexity (Cont…) • We can use the chain rule to expand the probability of W: • Thus, if we are computing the perplexity of W with a bigram language model, we get:
  • 10. Perplexity (Cont…) • Note that because of the inverse in previous equations, the higher the conditional probability of the word sequence, the lower the perplexity. • Thus, minimizing perplexity is equivalent to maximizing the test set probability according to the language model. • What we generally use for word sequence in those equations is the entire sequence of words in some test set. • Since this sequence will cross many sentence boundaries, we need to include the begin- and end-sentence markers <s> and </s> in the probability computation. • We also need to include the end-of-sentence marker </s> (but not the beginning-of-sentence marker <s>) in the total count of word tokens N.
  • 11. Perplexity (Cont…) • There is another way to think about perplexity: as the weighted average branching factor of a language. • The branching factor of a language is the number of possible next words that can follow any word. • Consider the task of recognizing the digits in English (zero, one, two,..., nine), given that each of the 10 digits occurs with equal probability P = 1/10. • The perplexity of this mini-language is in fact 10. • To see that, imagine a string of digits of length N.
  • 12. Perplexity for Comparing Different N- gram Models • We trained unigram, bigram, and trigram grammars on 38 million words (including start-of-sentence tokens) from the Wall Street Journal, using a 19,979 word vocabulary. • We then computed the perplexity of each of these models on a test set of 1.5 million words with following equation. • The table below shows the perplexity of a 1.5 million word WSJ test set according to each of these grammars.
  • 13. Perplexity for Comparing Different N- gram Models (Cont…) • As we see above, the more information the N- gram gives us about the word sequence, the lower the perplexity. • Note that in computing perplexities, the N-gram model P must be constructed without any knowledge of the test set or any prior knowledge of the vocabulary of the test set. • Any kind of knowledge of the test set can cause the perplexity to be artificially low. • The perplexity of two language models is only comparable if they use identical vocabularies.
  • 14. Generalization and Zeros • The statistical models are likely to be pretty useless as predictors if the training sets and the test sets are as different. • How should we deal with this problem when we build N-gram models? • One way is to be sure to use a training corpus that has a similar genre to whatever task we are trying to accomplish. • To build a language model for translating legal documents, we need a training corpus of legal documents.
  • 15. Generalization and Zeros (Cont…) • Matching genres is still not sufficient. • Our models may still be subject to the problem of sparsity. • For any N-gram that occurred a sufficient number of times, we might have a good estimate of its probability. • But because any corpus is limited, some perfectly acceptable English word sequences are bound to be missing from it. • That is, we’ll have many cases of putative “zero probability N-grams” that should really have some non-zero probability. • Consider the words that follow the bigram denied the in the WSJ Treebank3 corpus, together with their counts:
  • 16. Generalization and Zeros (Cont…) • To build a language model for a question- answering system, we need a training corpus of questions.
  • 17. The Zeros • Zeros— things that don’t ever occur in the training set but do occur in the test set—are a problem for two reasons. – First, their presence means we are underestimating the probability of all sorts of words that might occur, which will hurt the performance of any application we want to run on this data. – Second, if the probability of any word in the test set is 0, the entire probability of the test set is 0. • By definition, perplexity is based on the inverse probability of the test set. • Thus if some words have zero probability, we can’t compute perplexity at all, since we can’t divide by 0!
  • 18. Unknown Words • The previous section discussed the problem of words whose bigram probability is zero. • But what about words we simply have never seen before? • Closed Vocabulary – Sometimes we have a language task in which this can’t happen because we know all the words that can occur. – In such a closed vocabulary system the test set can only contain words from this lexicon, and there will be no unknown words. – This is a reasonable assumption in some domains, such as speech recognition or machine translation, where we have a pronunciation dictionary or a phrase table that are fixed in advance, and so the language model can only use the words in that dictionary or phrase table.
  • 19. Unknown Words (Cont…) • Open Vocabulary – In other cases we have to deal with words we haven’t seen before, which we’ll call unknown words, or out of vocabulary (OOV) words. – The percentage of OOV words that appear in the test set is called the OOV rate. – An open vocabulary system is one in which we model these potential unknown words in the test set by adding a pseudo-word called <UNK>.
  • 20. Train the Probabilities of Unknown Words • There are two common ways to train the probabilities of the unknown word model <UNK>. • 1st Method: – Turn the problem back into a closed vocabulary one by choosing a fixed vocabulary in advance: – Convert in the training set any word that is not in this set (any OOV word) to the unknown word token <UNK> in a text normalization step. – Estimate the probabilities for <UNK> from its counts just like any other regular word in the training set.
  • 21. Train the Probabilities of Unknown Words (Cont…) • 2nd Method – The second alternative, in situations where we don’t have a prior vocabulary in advance, is to create such a vocabulary implicitly, replacing words in the training data by <UNK> based on their frequency. – For example we can replace by <UNK> all words that occur fewer than n times in the training set, where n is some small number, or equivalently select a vocabulary size V in advance (say 50,000) and choose the top V words by frequency and replace the rest by UNK. – In either case we then proceed to train the language model as before, treating <UNK> like a regular word.
  • 22. Train the Probabilities of Unknown Words (Cont…) • The exact choice of <UNK> model does have an effect on metrics like perplexity. • A language model can achieve low perplexity by choosing a small vocabulary and assigning the unknown word a high probability. • For this reason, perplexities should only be compared across language models with the same vocabularies.