SlideShare a Scribd company logo
1 of 19
Language Modeling with
N-grams
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Introduction
• Models that assign probabilities to sequences of words are called
language models or LMs.
• In this lesson we introduce the simplest model that assigns
probabilities to sentences and sequences of words, the N-gram.
• An N-gram is a sequence of words: a 2-gram (or bigram) is a two-
word sequence of words like “please turn”, “turn your”, or ”your
homework”, and a 3-gram (or trigram) is a three-word sequence of
words like “please turn your”, or “turn your homework”.
• We’ll see how to use N-gram models to estimate the probability of
the last word of an N-gram given the previous words, and also to
assign probabilities to entire sequences.
• Whether estimating probabilities of next words or of whole
sequences, the N-gram model is one of the most important tools in
speech and language processing.
N-grams
• Let’s begin with the task of computing P(w|h),
the probability of a word w given some history
h.
• Suppose the history h is “its water is so
transparent that” and we want to know the
probability that the next word is the:
– P(the|its water is so transparent that):
N-grams (Cont…)
• One way to estimate this probability is from relative frequency counts:
take a very large corpus, count the number of times we see its water is so
transparent that, and count the number of times this is followed by the.
• This would be answering the question “Out of the times we saw the
history h, how many times was it followed by the word w”, as follows:
• With a large enough corpus, such as the web, we can compute these
counts and estimate the probability using above equation.
N-grams (Cont…)
• If we wanted to know the joint probability of
an entire sequence of words like its water is so
transparent, we could do it by asking “out of
all possible sequences of five words, how
many of them are its water is so transparent?”
• We would have to get the count of its water is
so transparent and divide by the sum of the
counts of all possible five word sequences.
• That seems rather a lot to estimate!
N-grams (Cont…)
• For this reason, we’ll need to introduce cleverer ways of
estimating the probability of a word w given a history h, or
the probability of an entire word sequence W.
• Let’s start with a little formalizing of notation.
• To represent the probability of a particular random variable
Xi taking on the value “the”, or P(Xi = “the”), we will use the
simplification P(the).
• We’ll represent a sequence of N words either as w1, …….,.
Wn or 𝑤1
𝑛
• For the joint probability of each word in a sequence having
a particular value P(X = w1, Y = w2, Z = w3, …..,W = wn)
we’ll use P(w1, w2, ……., wn).
N-grams (Cont…)
• Now how can we compute probabilities of entire
sequences like P(w1, w2, ……., wn)?
• One thing we can do is decompose this
probability using the chain rule of probability:
N-grams (Cont…)
• The chain rule shows the link between computing the joint probability of a
sequence and computing the conditional probability of a word given
previous words.
• Previous equation suggests that we could estimate the joint probability of
an entire sequence of words by multiplying together a number of
conditional probabilities.
• But using the chain rule doesn’t really seem to help us!
• We don’t know any way to compute the exact probability of a word given
a long sequence of preceding words, 𝑃 𝑊𝑛 𝑊1
𝑛−1
)
• As we said above, we can’t just estimate by counting the number of times
every word occurs following every long string, because language is
creative and any particular context might have never occurred before!
• The intuition of the N-gram model is that instead of computing the
probability of a word given its entire history, we can approximate the
history by just the last few words.
Bi-Gram
• The bigram model, for example, approximates the
probability of a word given all the previous words
𝑃 𝑊𝑛 𝑊1
𝑛−1
) by using only the conditional probability
of the preceding word 𝑃(𝑊𝑛|𝑊𝑛−1).
• In other words, instead of computing the probability
P(the|Walden Pond’s water is so transparent that) we
approximate it with the probability P(the|that)
• When we use a bigram model to predict the
conditional probability of the next word, we are thus
making the following approximation:
𝑊𝑛 𝑊1
𝑛−1
) ≈ 𝑃(𝑊𝑛|𝑊𝑛−1)
Markov Assumption
• The assumption that the probability of a
word depends only on the previous word is
called a Markov assumption.
• Markov models are the class of probabilistic
models that assume we can predict the
probability of some future unit without
looking too far into the past.
Generalize Bi-grams to N-grams
• We can generalize the bigram (which looks
one word into the past) to the trigram (which
looks two words into the past) and thus to the
N-gram (which looks N -1 words into the past).
• Thus, the general equation for this N-gram
approximation to the conditional probability
of the next word in a sequence is
𝑊𝑛 𝑊1
𝑛−1
) ≈ 𝑃(𝑊𝑛|𝑊𝑛−𝑁+1
𝑛−1
)
Generalize Bi-grams to N-grams
(Cont…)
• Given the bigram assumption for the
probability of an individual word, we can
• compute the probability of a complete word
sequence by substituting
𝑊𝑛 𝑊1
𝑛−1
) ≈ 𝑃(𝑊𝑛|𝑊𝑛−1)
• in to following equation:
𝑃 𝑊1
𝑛
=
𝑘=1
𝑛
𝑃(𝑊𝑘|𝑊𝑘−1)
Maximum Likelihood Estimation (MLE)
• How do we estimate these bigram or N-gram
probabilities?
• An intuitive way to estimate probabilities is
called maximum likelihood estimation or MLE.
• We get the MLE estimate for the parameters
of an N-gram model by getting counts from a
corpus, and normalizing the counts so that
they lie between 0 and 1.
Maximum Likelihood Estimation (MLE)
(Cont…)
• For example, to compute a particular bigram probability of
a word y given a previous word x, we’ll compute the count
of the bigram C(x,y) and normalize by the sum of all the
bigrams that share the same first word x:
𝑃 𝑊𝑛 𝑊𝑛−1 =
𝐶(𝑊𝑛−1 𝑊𝑛)
𝑊 𝐶(𝑊𝑛−1−𝑊𝑛)
• We can simplify this equation, since the sum of all bigram
counts that start with a given word (Wn-1) must be equal to
the unigram count for that word wn-1
𝑃 𝑊𝑛 𝑊𝑛−1 =
𝐶(𝑊𝑛−1 𝑊𝑛)
𝐶(𝑊𝑛−1)
Maximum Likelihood Estimation (MLE)
(Cont…)
• Let’s work through an example using a mini-corpus of
three sentences.
• We’ll first need to augment each sentence with a
special symbol <s> at the beginning of the sentence, to
give us the bigram context of the first word.
• We’ll also need a special end-symbol </s>.
Example 2
• Here are some text normalized sample user
queries extracted from Berkeley Restaurant
Project.
– can you tell me about any good cantonese restaurants
close by
– mid priced thai food is what i’m looking for tell me
about chez panisse
– can you give me a listing of the kinds of food that are
available
– i’m looking for a good place to eat breakfast
– when is caffe venezia open during the day
Example 2 (Cont…)
• Table shows the bigram counts from a piece of a bigram grammar
from the Berkeley Restaurant Project.
• Note that the majority of the values are zero.
• In fact, we have chosen the sample words to cohere with each
other; a matrix selected from a random set of seven words would
be even more sparse.
Example 2 (Cont…)
• Following table shows the bigram probabilities after normalization
(dividing each cell in previous table by the appropriate unigram for
its row, taken from the following set of unigram probabilities):
• Here are a few other useful probabilities:
Example 2 (Cont…)
• Now we can compute the probability of
sentences like I want English food by simply
multiplying the appropriate bigram
probabilities together, as follows:

More Related Content

What's hot

Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Context free grammars
Context free grammarsContext free grammars
Context free grammarsRonak Thakkar
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech taggersadakpramodh
 

What's hot (20)

Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
 
NLP
NLPNLP
NLP
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Context free grammars
Context free grammarsContext free grammars
Context free grammars
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech tagger
 

Similar to NLP_KASHK:N-Grams

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdfykyog
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAbhinav Gupta
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
An Inductive inference Machine
An Inductive inference MachineAn Inductive inference Machine
An Inductive inference MachineAly Abdelkareem
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdfRajMani28
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP TasksMasahiro Kaneko
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Elementary statistical inference1
Elementary statistical inference1Elementary statistical inference1
Elementary statistical inference1SEMINARGROOT
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 
predicateLogic.ppt
predicateLogic.pptpredicateLogic.ppt
predicateLogic.pptMUZAMILALI48
 

Similar to NLP_KASHK:N-Grams (20)

Language models
Language modelsLanguage models
Language models
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
An Inductive inference Machine
An Inductive inference MachineAn Inductive inference Machine
An Inductive inference Machine
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow
 
12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Elementary statistical inference1
Elementary statistical inference1Elementary statistical inference1
Elementary statistical inference1
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
predicateLogic.ppt
predicateLogic.pptpredicateLogic.ppt
predicateLogic.ppt
 

More from Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingHemantha Kulathilake
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants Hemantha Kulathilake
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types Hemantha Kulathilake
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming Hemantha Kulathilake
 
COM1407: Structured Program Development
COM1407: Structured Program Development COM1407: Structured Program Development
COM1407: Structured Program Development Hemantha Kulathilake
 

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
 
COM1407: Structured Program Development
COM1407: Structured Program Development COM1407: Structured Program Development
COM1407: Structured Program Development
 

Recently uploaded

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

NLP_KASHK:N-Grams

  • 1. Language Modeling with N-grams K.A.S.H. Kulathilake B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
  • 2. Introduction • Models that assign probabilities to sequences of words are called language models or LMs. • In this lesson we introduce the simplest model that assigns probabilities to sentences and sequences of words, the N-gram. • An N-gram is a sequence of words: a 2-gram (or bigram) is a two- word sequence of words like “please turn”, “turn your”, or ”your homework”, and a 3-gram (or trigram) is a three-word sequence of words like “please turn your”, or “turn your homework”. • We’ll see how to use N-gram models to estimate the probability of the last word of an N-gram given the previous words, and also to assign probabilities to entire sequences. • Whether estimating probabilities of next words or of whole sequences, the N-gram model is one of the most important tools in speech and language processing.
  • 3. N-grams • Let’s begin with the task of computing P(w|h), the probability of a word w given some history h. • Suppose the history h is “its water is so transparent that” and we want to know the probability that the next word is the: – P(the|its water is so transparent that):
  • 4. N-grams (Cont…) • One way to estimate this probability is from relative frequency counts: take a very large corpus, count the number of times we see its water is so transparent that, and count the number of times this is followed by the. • This would be answering the question “Out of the times we saw the history h, how many times was it followed by the word w”, as follows: • With a large enough corpus, such as the web, we can compute these counts and estimate the probability using above equation.
  • 5. N-grams (Cont…) • If we wanted to know the joint probability of an entire sequence of words like its water is so transparent, we could do it by asking “out of all possible sequences of five words, how many of them are its water is so transparent?” • We would have to get the count of its water is so transparent and divide by the sum of the counts of all possible five word sequences. • That seems rather a lot to estimate!
  • 6. N-grams (Cont…) • For this reason, we’ll need to introduce cleverer ways of estimating the probability of a word w given a history h, or the probability of an entire word sequence W. • Let’s start with a little formalizing of notation. • To represent the probability of a particular random variable Xi taking on the value “the”, or P(Xi = “the”), we will use the simplification P(the). • We’ll represent a sequence of N words either as w1, …….,. Wn or 𝑤1 𝑛 • For the joint probability of each word in a sequence having a particular value P(X = w1, Y = w2, Z = w3, …..,W = wn) we’ll use P(w1, w2, ……., wn).
  • 7. N-grams (Cont…) • Now how can we compute probabilities of entire sequences like P(w1, w2, ……., wn)? • One thing we can do is decompose this probability using the chain rule of probability:
  • 8. N-grams (Cont…) • The chain rule shows the link between computing the joint probability of a sequence and computing the conditional probability of a word given previous words. • Previous equation suggests that we could estimate the joint probability of an entire sequence of words by multiplying together a number of conditional probabilities. • But using the chain rule doesn’t really seem to help us! • We don’t know any way to compute the exact probability of a word given a long sequence of preceding words, 𝑃 𝑊𝑛 𝑊1 𝑛−1 ) • As we said above, we can’t just estimate by counting the number of times every word occurs following every long string, because language is creative and any particular context might have never occurred before! • The intuition of the N-gram model is that instead of computing the probability of a word given its entire history, we can approximate the history by just the last few words.
  • 9. Bi-Gram • The bigram model, for example, approximates the probability of a word given all the previous words 𝑃 𝑊𝑛 𝑊1 𝑛−1 ) by using only the conditional probability of the preceding word 𝑃(𝑊𝑛|𝑊𝑛−1). • In other words, instead of computing the probability P(the|Walden Pond’s water is so transparent that) we approximate it with the probability P(the|that) • When we use a bigram model to predict the conditional probability of the next word, we are thus making the following approximation: 𝑊𝑛 𝑊1 𝑛−1 ) ≈ 𝑃(𝑊𝑛|𝑊𝑛−1)
  • 10. Markov Assumption • The assumption that the probability of a word depends only on the previous word is called a Markov assumption. • Markov models are the class of probabilistic models that assume we can predict the probability of some future unit without looking too far into the past.
  • 11. Generalize Bi-grams to N-grams • We can generalize the bigram (which looks one word into the past) to the trigram (which looks two words into the past) and thus to the N-gram (which looks N -1 words into the past). • Thus, the general equation for this N-gram approximation to the conditional probability of the next word in a sequence is 𝑊𝑛 𝑊1 𝑛−1 ) ≈ 𝑃(𝑊𝑛|𝑊𝑛−𝑁+1 𝑛−1 )
  • 12. Generalize Bi-grams to N-grams (Cont…) • Given the bigram assumption for the probability of an individual word, we can • compute the probability of a complete word sequence by substituting 𝑊𝑛 𝑊1 𝑛−1 ) ≈ 𝑃(𝑊𝑛|𝑊𝑛−1) • in to following equation: 𝑃 𝑊1 𝑛 = 𝑘=1 𝑛 𝑃(𝑊𝑘|𝑊𝑘−1)
  • 13. Maximum Likelihood Estimation (MLE) • How do we estimate these bigram or N-gram probabilities? • An intuitive way to estimate probabilities is called maximum likelihood estimation or MLE. • We get the MLE estimate for the parameters of an N-gram model by getting counts from a corpus, and normalizing the counts so that they lie between 0 and 1.
  • 14. Maximum Likelihood Estimation (MLE) (Cont…) • For example, to compute a particular bigram probability of a word y given a previous word x, we’ll compute the count of the bigram C(x,y) and normalize by the sum of all the bigrams that share the same first word x: 𝑃 𝑊𝑛 𝑊𝑛−1 = 𝐶(𝑊𝑛−1 𝑊𝑛) 𝑊 𝐶(𝑊𝑛−1−𝑊𝑛) • We can simplify this equation, since the sum of all bigram counts that start with a given word (Wn-1) must be equal to the unigram count for that word wn-1 𝑃 𝑊𝑛 𝑊𝑛−1 = 𝐶(𝑊𝑛−1 𝑊𝑛) 𝐶(𝑊𝑛−1)
  • 15. Maximum Likelihood Estimation (MLE) (Cont…) • Let’s work through an example using a mini-corpus of three sentences. • We’ll first need to augment each sentence with a special symbol <s> at the beginning of the sentence, to give us the bigram context of the first word. • We’ll also need a special end-symbol </s>.
  • 16. Example 2 • Here are some text normalized sample user queries extracted from Berkeley Restaurant Project. – can you tell me about any good cantonese restaurants close by – mid priced thai food is what i’m looking for tell me about chez panisse – can you give me a listing of the kinds of food that are available – i’m looking for a good place to eat breakfast – when is caffe venezia open during the day
  • 17. Example 2 (Cont…) • Table shows the bigram counts from a piece of a bigram grammar from the Berkeley Restaurant Project. • Note that the majority of the values are zero. • In fact, we have chosen the sample words to cohere with each other; a matrix selected from a random set of seven words would be even more sparse.
  • 18. Example 2 (Cont…) • Following table shows the bigram probabilities after normalization (dividing each cell in previous table by the appropriate unigram for its row, taken from the following set of unigram probabilities): • Here are a few other useful probabilities:
  • 19. Example 2 (Cont…) • Now we can compute the probability of sentences like I want English food by simply multiplying the appropriate bigram probabilities together, as follows: