SlideShare a Scribd company logo
1 of 26
Jarrar © 2014 1
Dr. Mustafa Jarrar
Sina Institute, University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Artificial Intelligence
Probabilistic Language Modeling
Introduction to N-grams
Lecture Notes on Probabilistic Language Modeling,
University of Birzeit, Palestine
2014
Jarrar © 2014 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2011/11/artificial-intelligence-fall-2011.html
Most information based on facts found in [1]
Jarrar © 2014 3
Outline
• Probabilistic Language Models
• Chain Rule
• Markov Assumption
• N-gram
• Example
• Available language models
• Evaluate Probabilistic Language Models
Acknowledgement: This lecture is largely based on Dan Jurafsky’s online course on NLP, which can
be accessed at http://www.youtube.com/watch?v=s3kKlUBa3b0
Keywords: Natural Language Processing, NLP, Language model, Probabilistic Language Models
Chain Rule, Markov Assumption, unigram, bigram, N-gram, Curpus
‫الحاسوبية‬ ‫اللسانيات‬ , ‫اللغوي‬ ‫الغموض‬ , ‫التحليل‬‫إحصائيا‬ ‫اللغوي‬ , ‫لغوية‬ ‫تطبيقات‬ , ‫فرضية‬ ‫ماركوف‬ , ‫,مدونة‬ ‫الطبيعية‬ ‫للغات‬ ‫اآللية‬ ‫المعالجة‬
Jarrar © 2014 4
Why Probabilistic Language Models
Goal: assign a probability to a sentence (“as used by native speakers”)
Why do we need probabilistic language models?
Machine Translation: to generate better translations
P(high winds tonite) > P(large winds tonite)
Spell Correction: to the much more likely to happen(i.e., more correct)
The office is about fifteen minuets from my house
P(about fifteen minutes from) > P(about fifteen minuets from)
Speech Recognition
P(I saw a van) >> P(eyes awe of an)
+ Summarization, question-answering, etc., etc.!!
Jarrar © 2014 5
Probabilistic Language Modeling
Goal: Given a corpus, compute the probability of a sentence W or sequence of
words w1 w2 w3 w4 w5…wn:
P(W) = P(w1,w2,w3,w4,w5…wn)
P(How to cook rice) = P(How, to, cook, rice)
Related task: probability of an upcoming word. That is, given the sentence
w1,w2,w3,w4, what is the probability that w5 will be the next word:
P(w5|w1,w2,w3,w4) //P(rice| how, to, cook)
related to P(w1,w2,w3,w4,w5) //P(how, to, cook, rice)
A model that computes:
P(W) or P(wn|w1,w2…wn-1) is called a language model.
Better: the grammar - language model
 Intuition: let’s rely on the Chain Rule of Probability
Jarrar © 2014 6
Reminder: The Chain Rule
Recall the definition of conditional probabilities:
More variables:
P(A,B,C,D) = P(A) × P(B|A) × P(C|A,B) × P(D|A,B,C)
Example: P(“its water is so transparent”) =
P(its) × P(water|its) × P(is|its water)× P(so|its water is)× P(transparent | its water is so)
The Chain Rule in general is:
P(w1,w2,w3,…,wn) = P(w1) × P(w2|w1) × P(w3|w1,w2) × … × P(wn|w1,…,wn-1)
P(A|B) × P(B) = P(A,B)P(A|B) = P(A,B)
P(B)
Rewriting
or P(A,B) = P(A|B) × P(B)
P(w1w2… wn ) = P(wi | w1w2… wi-1)
i
Õ
Jarrar © 2014 7
How to Estimate Probabilities
Given a large corpus of English (that represents the language), should
we just divide all words and count all probabilities?
No! Too many possible sentences!
We’ll never have enough data (the counts of all possible sentences)
for estimating these.
P(the | its water is so transparent that) =
Count(its water is so transparent that the)
Count(its water is so transparent that)
Jarrar © 2014 8
Markov Assumption
Instead, we apply a simplifying assumption:
Markov suggests: Instead of the counts of all possible sentences it is enough to
only count the last few words
In other words, approximate each component in the product (this is enough)
Example:
Or maybe better:
P(the | its water is so transparent that) » P(the | that)
P(the | its water is so transparent that) » P(the | transparent that)
Andrei Markov
(1856–1922),
Russian mathematician
P(w1w2… wn ) » P(wi | wi-k … wi-1)
i
Õ
P(wi | w1w2… wi-1) » P(wi | wi-k … wi-1)
Based on [2]
Jarrar © 2014 9
Unigram Model -Simplest case of Markov Model
fifth, an, of, futures, the, an, incorporated, a, a,
the, inflation, most, dollars, quarter, in, is, mass
thrift, did, eighty, said, hard, 'm, july, bullish
that, or, limited, the
Example of some automatically generated sentences from a unigram
model, (words are independent):
P(w1w2… wn ) » P(wi )
i
Õ
Estimate the probability of whole sequence of words by the product of
probability of individual words:
 This is not really a useful model
P(its water is so transparent that the) ≈
P(its) x P(water) x P(is) x P(so) x P(transparent) x P(that) x P(the)
Jarrar © 2014 10
Bigram Model
Condition on the previous word:
Estimate the probability of a word given the entire prefix (from the
begging to the pervious word) only by the pervious word.
 The used conditioning (bigram) is still producing something
is wrong/weak!
P(wi | w1w2… wi-1) » P(wi | wi-1)
P(its water is so transparent that the) ≈ P(water | its) x P(is | water) ) x
P(so | is) x P(transparent | so) x P(that | transparent ) x P(the | that)
Jarrar © 2014 11
N-gram models
We can extend to 3-grams, 4-grams, 5-grams
In general this is an insufficient model of language
– because language has long-distance dependencies:
Predict: “the computer crashed”!!
“The computer which I had just put into the machine room on the fifth
floor crashed.”
This means that we have to consider lots of long sentences.
But in practice we can often get away with N-gram model.
Jarrar © 2014 12
Estimating Bigram Probabilities
The Maximum Likelihood Estimate
<s> I am Sam </s>
<s> Sam I am </s>
<s> I do not like green eggs and ham </s>
Example: Sample Corpus
if we haveword wi-1, how many
times it was followed by word wi
Jarrar © 2014 13
Another Example
… can you tell me about any good cantonese restaurants close by
mid priced thai food is what i’m looking for
tell me about chez panisse
can you give me a listing of the kinds of food that are available
i’m looking for a good place to eat breakfast
when is caffe venezia open during the day …
Given this larger corpus
Bigram Counts (Out of 9222 sentences)
Many counts
are Zero
Jarrar © 2014 14
Raw bigram probabilities
Normalize by unigrams:
Result:
Normalizing the previous table/counts with the following:
Jarrar © 2014 15
Bigram estimates of sentence probabilities
P(<s> I want english food </s>) ≈
P(I|<s>)
× P(want|I)
× P(english|want)
× P(food|english)
× P(</s>|food)
= .000031
Jarrar © 2014 16
What kinds of knowledge?
P(english | want) = .0011
P(chinese | want) = .0065
P(to | want) = .66
P(eat | to) = .28
P(food | to) = 0
P(want | spend) = 0
P (i | <s>) = .25
 These numbers reflect how English is used in practice (our corpus).
Jarrar © 2014 17
Practical Issues
In practice we don’t keep these probabilities in the for probabilities, we keep them
in the form of log probabilities; that is, We do everything in log space for two
reasons:
– Avoid underflow (as we multi many small numbers, yield arithmetic underflow)
– Adding is faster than multiplying.
Jarrar © 2014 18
Available Resources
There are many available Language models that you can try
…
Google N-Gram Release, August 2006
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
Jarrar © 2014 19
Google N-Gram Release
serve as the incoming 92
serve as the incubator 99
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
Jarrar © 2014 20
Other Models
Google Book N-grams Viewer
http://ngrams.googlelabs.com/
SRILM - The SRI Language Modeling Toolkit
http://www.speech.sri.com/projects/srilm/
Jarrar © 2014 21
How to know a language is model is good?
Does the language model prefer good sentences to bad ones?
– Assign higher probability to “real” or “frequently observed”
sentences
• Than “ungrammatical” or “rarely observed” sentences?
Train parameters of our model on a training set.
Test the model’s performance on data you haven’t seen.
– A test set is an unseen dataset that is different from our training set,
totally unused.
– An evaluation metric tells us how well our model does on the test set.
 Two way to evaluate a language model
 Extrinsic evaluation (in-vivo)
 intrinsic evaluation (perplexity)
Jarrar © 2014 22
Extrinsic evaluation of N-gram models
Best evaluation for comparing models A and B
– Put each model in a task
• spelling corrector, speech recognizer, MT system
– Run the task, get an accuracy for A and for B
• How many misspelled words corrected properly
• How many words translated correctly
– Compare accuracy for A and B
 Extrinsic evaluation is time-consuming; can take days or weeks
Jarrar © 2014 23
Intuition of Perplexity (intrinsic evaluation )
– How well can we predict the next word?
A better model of a text
– is one which assigns a higher probability to the word that actually occurs,
gives, Gives the highest P(sentence).
I always order pizza with cheese and ____
The 33rd President of the US was ____
I saw a ____
mushrooms 0.1
pepperoni 0.1
anchovies 0.01
….
fried rice 0.0001
….
and 1e-100
Perplexity is the probability of the test set,
normalized by the number of words:
Minimizing perplexity is the same as maximizing probability
Jarrar © 2014 24
Project
Develop an auto complete web form, based on a 3-gram language model.
Each student need to collect an Arabic corpus of 10000 words (10 documents) at
least. Students can use the same corpus, fully or partially.
Tokenize the corpus into tokens/words, then build a tri-gram language model for
this corpus. That is, your language = Table that contains word counts + table that
contains the probability (or log) of each 3-grams.
Develop an autocomplete web form that is able to uses your language model to
autocomplete what users write (no matter how long their queries).
Deadline: 23/9/2013
Jarrar © 2014 25
Idea for Graduate Project
Take Curras (a well annotated corpus for
the Palestinian dialect, developed at Sina
Institute), and build and evaluate a
language model for this corpus.
Jarrar © 2014 26
References
[1] Dan Jurafsky:From Languages to Information notes
http://web.stanford.edu/class/cs124
[2] Wikipedia: Andrei Markov
http://en.wikipedia.org/wiki/Andrey_Markov

More Related Content

What's hot

Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDaisuke BEKKI
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingSho Takase
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationasimuop
 
Predicate calculus
Predicate calculusPredicate calculus
Predicate calculusRajendran
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy재연 윤
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agentsMegha Sharma
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Hitomi Yanaka
 
Predicate logic_2(Artificial Intelligence)
Predicate logic_2(Artificial Intelligence)Predicate logic_2(Artificial Intelligence)
Predicate logic_2(Artificial Intelligence)SHUBHAM KUMAR GUPTA
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 

What's hot (20)

AI Lesson 09
AI Lesson 09AI Lesson 09
AI Lesson 09
 
AI Lesson 11
AI Lesson 11AI Lesson 11
AI Lesson 11
 
AI Lecture 6 (logical agents)
AI Lecture 6 (logical agents)AI Lecture 6 (logical agents)
AI Lecture 6 (logical agents)
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Logic
LogicLogic
Logic
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Predicate calculus
Predicate calculusPredicate calculus
Predicate calculus
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
First order logic
First order logicFirst order logic
First order logic
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?
 
Fuzzy hypersoft sets and its weightage operator for decision making
Fuzzy hypersoft sets and its weightage operator for decision makingFuzzy hypersoft sets and its weightage operator for decision making
Fuzzy hypersoft sets and its weightage operator for decision making
 
Predicate logic_2(Artificial Intelligence)
Predicate logic_2(Artificial Intelligence)Predicate logic_2(Artificial Intelligence)
Predicate logic_2(Artificial Intelligence)
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
Chapter 4 (final)
Chapter 4 (final)Chapter 4 (final)
Chapter 4 (final)
 

Viewers also liked

Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingMustafa Jarrar
 
Lecture 2
Lecture 2Lecture 2
Lecture 2butest
 
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence Introduction to Artificial Intelligence
Introduction to Artificial Intelligence Mustafa Jarrar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
الممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربيةالممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربيةIbrahim Suliman
 
أصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسبأصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسبyesserNoueiry
 
[GL] 7 العلاقة بين اللغة و الفكر
[GL] 7  العلاقة بين اللغة و الفكر[GL] 7  العلاقة بين اللغة و الفكر
[GL] 7 العلاقة بين اللغة و الفكرTimAisyah
 
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغويةTimAisyah
 
Jarrar: Un-informed Search
Jarrar: Un-informed SearchJarrar: Un-informed Search
Jarrar: Un-informed SearchMustafa Jarrar
 
مطبوع اللسانيات
مطبوع اللسانياتمطبوع اللسانيات
مطبوع اللسانياتأبو وردة
 
علم الدلالة د أحمد مختار عمر
علم  الدلالة  د أحمد مختار عمرعلم  الدلالة  د أحمد مختار عمر
علم الدلالة د أحمد مختار عمرAST-School
 
Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Hhm Mayuuf
 
[GL] 4 مبادئ مدارس اللسانيات الوصفية
[GL] 4 مبادئ مدارس اللسانيات الوصفية[GL] 4 مبادئ مدارس اللسانيات الوصفية
[GL] 4 مبادئ مدارس اللسانيات الوصفيةTimAisyah
 
What is Phonology?
What is Phonology?What is Phonology?
What is Phonology?Mary Acevedo
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to OntologyMustafa Jarrar
 
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانب
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانبالوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانب
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانبNurul Hadi
 
Contrastive phonology(english vs
Contrastive phonology(english vsContrastive phonology(english vs
Contrastive phonology(english vsAima Buttar
 

Viewers also liked (20)

Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language Processing
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Jarrar: Games
Jarrar: GamesJarrar: Games
Jarrar: Games
 
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence Introduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
ICALL 2015
ICALL 2015ICALL 2015
ICALL 2015
 
الممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربيةالممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربية
 
أصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسبأصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسب
 
[GL] 7 العلاقة بين اللغة و الفكر
[GL] 7  العلاقة بين اللغة و الفكر[GL] 7  العلاقة بين اللغة و الفكر
[GL] 7 العلاقة بين اللغة و الفكر
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
 
Jarrar: Un-informed Search
Jarrar: Un-informed SearchJarrar: Un-informed Search
Jarrar: Un-informed Search
 
مطبوع اللسانيات
مطبوع اللسانياتمطبوع اللسانيات
مطبوع اللسانيات
 
علم الدلالة د أحمد مختار عمر
علم  الدلالة  د أحمد مختار عمرعلم  الدلالة  د أحمد مختار عمر
علم الدلالة د أحمد مختار عمر
 
Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013
 
[GL] 4 مبادئ مدارس اللسانيات الوصفية
[GL] 4 مبادئ مدارس اللسانيات الوصفية[GL] 4 مبادئ مدارس اللسانيات الوصفية
[GL] 4 مبادئ مدارس اللسانيات الوصفية
 
What is Phonology?
What is Phonology?What is Phonology?
What is Phonology?
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to Ontology
 
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانب
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانبالوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانب
الوسائل التعليمية ومشكلات تعليم اللغة العربية للأجانب
 
Contrastive phonology(english vs
Contrastive phonology(english vsContrastive phonology(english vs
Contrastive phonology(english vs
 

Similar to Jarrar: Probabilistic Language Modeling - Introduction to N-grams

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdfykyog
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSAUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSijfcstjournal
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingrewa_monami
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSAUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSijfcstjournal
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"Romén Rodríguez-Gil
 

Similar to Jarrar: Probabilistic Language Modeling - Introduction to N-grams (15)

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSAUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Cl.week5-6
Cl.week5-6Cl.week5-6
Cl.week5-6
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELSAUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Intern presentation
Intern presentationIntern presentation
Intern presentation
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"
Langproving / Vocabulary Notebook: "Concurso Emprendedor XXI"
 

More from Mustafa Jarrar

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisMustafa Jarrar
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal OntologyMustafa Jarrar
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course OutlineMustafa Jarrar
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process ImplementationMustafa Jarrar
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineeringMustafa Jarrar
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsMustafa Jarrar
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs Mustafa Jarrar
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process ManagementMustafa Jarrar
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology Mustafa Jarrar
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesMustafa Jarrar
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORMMustafa Jarrar
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineMustafa Jarrar
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesMustafa Jarrar
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalMustafa Jarrar
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsMustafa Jarrar
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingMustafa Jarrar
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsMustafa Jarrar
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringMustafa Jarrar
 

More from Mustafa Jarrar (20)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Jarrar: Probabilistic Language Modeling - Introduction to N-grams

  • 1. Jarrar © 2014 1 Dr. Mustafa Jarrar Sina Institute, University of Birzeit mjarrar@birzeit.edu www.jarrar.info Artificial Intelligence Probabilistic Language Modeling Introduction to N-grams Lecture Notes on Probabilistic Language Modeling, University of Birzeit, Palestine 2014
  • 2. Jarrar © 2014 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2011/11/artificial-intelligence-fall-2011.html Most information based on facts found in [1]
  • 3. Jarrar © 2014 3 Outline • Probabilistic Language Models • Chain Rule • Markov Assumption • N-gram • Example • Available language models • Evaluate Probabilistic Language Models Acknowledgement: This lecture is largely based on Dan Jurafsky’s online course on NLP, which can be accessed at http://www.youtube.com/watch?v=s3kKlUBa3b0 Keywords: Natural Language Processing, NLP, Language model, Probabilistic Language Models Chain Rule, Markov Assumption, unigram, bigram, N-gram, Curpus ‫الحاسوبية‬ ‫اللسانيات‬ , ‫اللغوي‬ ‫الغموض‬ , ‫التحليل‬‫إحصائيا‬ ‫اللغوي‬ , ‫لغوية‬ ‫تطبيقات‬ , ‫فرضية‬ ‫ماركوف‬ , ‫,مدونة‬ ‫الطبيعية‬ ‫للغات‬ ‫اآللية‬ ‫المعالجة‬
  • 4. Jarrar © 2014 4 Why Probabilistic Language Models Goal: assign a probability to a sentence (“as used by native speakers”) Why do we need probabilistic language models? Machine Translation: to generate better translations P(high winds tonite) > P(large winds tonite) Spell Correction: to the much more likely to happen(i.e., more correct) The office is about fifteen minuets from my house P(about fifteen minutes from) > P(about fifteen minuets from) Speech Recognition P(I saw a van) >> P(eyes awe of an) + Summarization, question-answering, etc., etc.!!
  • 5. Jarrar © 2014 5 Probabilistic Language Modeling Goal: Given a corpus, compute the probability of a sentence W or sequence of words w1 w2 w3 w4 w5…wn: P(W) = P(w1,w2,w3,w4,w5…wn) P(How to cook rice) = P(How, to, cook, rice) Related task: probability of an upcoming word. That is, given the sentence w1,w2,w3,w4, what is the probability that w5 will be the next word: P(w5|w1,w2,w3,w4) //P(rice| how, to, cook) related to P(w1,w2,w3,w4,w5) //P(how, to, cook, rice) A model that computes: P(W) or P(wn|w1,w2…wn-1) is called a language model. Better: the grammar - language model  Intuition: let’s rely on the Chain Rule of Probability
  • 6. Jarrar © 2014 6 Reminder: The Chain Rule Recall the definition of conditional probabilities: More variables: P(A,B,C,D) = P(A) × P(B|A) × P(C|A,B) × P(D|A,B,C) Example: P(“its water is so transparent”) = P(its) × P(water|its) × P(is|its water)× P(so|its water is)× P(transparent | its water is so) The Chain Rule in general is: P(w1,w2,w3,…,wn) = P(w1) × P(w2|w1) × P(w3|w1,w2) × … × P(wn|w1,…,wn-1) P(A|B) × P(B) = P(A,B)P(A|B) = P(A,B) P(B) Rewriting or P(A,B) = P(A|B) × P(B) P(w1w2… wn ) = P(wi | w1w2… wi-1) i Õ
  • 7. Jarrar © 2014 7 How to Estimate Probabilities Given a large corpus of English (that represents the language), should we just divide all words and count all probabilities? No! Too many possible sentences! We’ll never have enough data (the counts of all possible sentences) for estimating these. P(the | its water is so transparent that) = Count(its water is so transparent that the) Count(its water is so transparent that)
  • 8. Jarrar © 2014 8 Markov Assumption Instead, we apply a simplifying assumption: Markov suggests: Instead of the counts of all possible sentences it is enough to only count the last few words In other words, approximate each component in the product (this is enough) Example: Or maybe better: P(the | its water is so transparent that) » P(the | that) P(the | its water is so transparent that) » P(the | transparent that) Andrei Markov (1856–1922), Russian mathematician P(w1w2… wn ) » P(wi | wi-k … wi-1) i Õ P(wi | w1w2… wi-1) » P(wi | wi-k … wi-1) Based on [2]
  • 9. Jarrar © 2014 9 Unigram Model -Simplest case of Markov Model fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass thrift, did, eighty, said, hard, 'm, july, bullish that, or, limited, the Example of some automatically generated sentences from a unigram model, (words are independent): P(w1w2… wn ) » P(wi ) i Õ Estimate the probability of whole sequence of words by the product of probability of individual words:  This is not really a useful model P(its water is so transparent that the) ≈ P(its) x P(water) x P(is) x P(so) x P(transparent) x P(that) x P(the)
  • 10. Jarrar © 2014 10 Bigram Model Condition on the previous word: Estimate the probability of a word given the entire prefix (from the begging to the pervious word) only by the pervious word.  The used conditioning (bigram) is still producing something is wrong/weak! P(wi | w1w2… wi-1) » P(wi | wi-1) P(its water is so transparent that the) ≈ P(water | its) x P(is | water) ) x P(so | is) x P(transparent | so) x P(that | transparent ) x P(the | that)
  • 11. Jarrar © 2014 11 N-gram models We can extend to 3-grams, 4-grams, 5-grams In general this is an insufficient model of language – because language has long-distance dependencies: Predict: “the computer crashed”!! “The computer which I had just put into the machine room on the fifth floor crashed.” This means that we have to consider lots of long sentences. But in practice we can often get away with N-gram model.
  • 12. Jarrar © 2014 12 Estimating Bigram Probabilities The Maximum Likelihood Estimate <s> I am Sam </s> <s> Sam I am </s> <s> I do not like green eggs and ham </s> Example: Sample Corpus if we haveword wi-1, how many times it was followed by word wi
  • 13. Jarrar © 2014 13 Another Example … can you tell me about any good cantonese restaurants close by mid priced thai food is what i’m looking for tell me about chez panisse can you give me a listing of the kinds of food that are available i’m looking for a good place to eat breakfast when is caffe venezia open during the day … Given this larger corpus Bigram Counts (Out of 9222 sentences) Many counts are Zero
  • 14. Jarrar © 2014 14 Raw bigram probabilities Normalize by unigrams: Result: Normalizing the previous table/counts with the following:
  • 15. Jarrar © 2014 15 Bigram estimates of sentence probabilities P(<s> I want english food </s>) ≈ P(I|<s>) × P(want|I) × P(english|want) × P(food|english) × P(</s>|food) = .000031
  • 16. Jarrar © 2014 16 What kinds of knowledge? P(english | want) = .0011 P(chinese | want) = .0065 P(to | want) = .66 P(eat | to) = .28 P(food | to) = 0 P(want | spend) = 0 P (i | <s>) = .25  These numbers reflect how English is used in practice (our corpus).
  • 17. Jarrar © 2014 17 Practical Issues In practice we don’t keep these probabilities in the for probabilities, we keep them in the form of log probabilities; that is, We do everything in log space for two reasons: – Avoid underflow (as we multi many small numbers, yield arithmetic underflow) – Adding is faster than multiplying.
  • 18. Jarrar © 2014 18 Available Resources There are many available Language models that you can try … Google N-Gram Release, August 2006 http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
  • 19. Jarrar © 2014 19 Google N-Gram Release serve as the incoming 92 serve as the incubator 99 serve as the independent 794 serve as the index 223 serve as the indication 72 serve as the indicator 120 serve as the indicators 45 serve as the indispensable 111 serve as the indispensible 40 serve as the individual 234 http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
  • 20. Jarrar © 2014 20 Other Models Google Book N-grams Viewer http://ngrams.googlelabs.com/ SRILM - The SRI Language Modeling Toolkit http://www.speech.sri.com/projects/srilm/
  • 21. Jarrar © 2014 21 How to know a language is model is good? Does the language model prefer good sentences to bad ones? – Assign higher probability to “real” or “frequently observed” sentences • Than “ungrammatical” or “rarely observed” sentences? Train parameters of our model on a training set. Test the model’s performance on data you haven’t seen. – A test set is an unseen dataset that is different from our training set, totally unused. – An evaluation metric tells us how well our model does on the test set.  Two way to evaluate a language model  Extrinsic evaluation (in-vivo)  intrinsic evaluation (perplexity)
  • 22. Jarrar © 2014 22 Extrinsic evaluation of N-gram models Best evaluation for comparing models A and B – Put each model in a task • spelling corrector, speech recognizer, MT system – Run the task, get an accuracy for A and for B • How many misspelled words corrected properly • How many words translated correctly – Compare accuracy for A and B  Extrinsic evaluation is time-consuming; can take days or weeks
  • 23. Jarrar © 2014 23 Intuition of Perplexity (intrinsic evaluation ) – How well can we predict the next word? A better model of a text – is one which assigns a higher probability to the word that actually occurs, gives, Gives the highest P(sentence). I always order pizza with cheese and ____ The 33rd President of the US was ____ I saw a ____ mushrooms 0.1 pepperoni 0.1 anchovies 0.01 …. fried rice 0.0001 …. and 1e-100 Perplexity is the probability of the test set, normalized by the number of words: Minimizing perplexity is the same as maximizing probability
  • 24. Jarrar © 2014 24 Project Develop an auto complete web form, based on a 3-gram language model. Each student need to collect an Arabic corpus of 10000 words (10 documents) at least. Students can use the same corpus, fully or partially. Tokenize the corpus into tokens/words, then build a tri-gram language model for this corpus. That is, your language = Table that contains word counts + table that contains the probability (or log) of each 3-grams. Develop an autocomplete web form that is able to uses your language model to autocomplete what users write (no matter how long their queries). Deadline: 23/9/2013
  • 25. Jarrar © 2014 25 Idea for Graduate Project Take Curras (a well annotated corpus for the Palestinian dialect, developed at Sina Institute), and build and evaluate a language model for this corpus.
  • 26. Jarrar © 2014 26 References [1] Dan Jurafsky:From Languages to Information notes http://web.stanford.edu/class/cs124 [2] Wikipedia: Andrei Markov http://en.wikipedia.org/wiki/Andrey_Markov