SlideShare a Scribd company logo
1 of 17
Language Model (N-Gram)
Suriandi Jayakrista S.T.,M.T.
Kalbis Institute
N-gram
• Model probabilistik N-gram, merupakan model yang digunakan untuk
memprediksi kata berikutnya yang mungkin dari kata N-1 sebelumnya. Model
statistika dari urutan kata ini seringkali disebut juga sebagai model bahasa
(language models / LMs).
• Model estimasi seperti N-gram memberikan probabilitas kemungkinan pada
kata berikutnya yang mungkin dapat digunakan untuk melakukan
kemungkinan penggabungan pada keseluruhan kalimat. Model N-
gram merupakan model yang paling penting dalam pemrosesan suara
ataupun bahasa baik untuk memperkirakan probabilitas kata berikutnya
maupun keseluruhan sequence.
• N-Grams ini akan mengambil sejumlah kata yang kita butuhkan
dari sebuah kalimat. Seperti contohnya kalimat “Fahmi adalah
anak yang rajin”, maka bentuk N-Grams dari kalimat tersebut
adalah :
1.Unigram : Fahmi, adalah, anak, yang, rajin
2.Bigram : Fahmi adalah, adalah anak, anak yang, yang rajin
3.Trigram : Fahmi adalah anak, anak yang rajin
4.dst.
What is an N-gram?
• An n-gram is a subsequence of n items from a given sequence.
• Unigram: n-gram of size 1
• Bigram: n-gram of size 2
• Trigram: n-gram of size 3
• Item:
• Phonemes
• Syllables
• Letters
• Words
• Anything else depending on the application.
Example
the dog smelled like a skunk
• Bigram:
• "# the”, “the dog”, “dog smelled”, “ smelled like”, “like a”, “a skunk”, “skunk#”
• Trigram:
• "# the dog", "the dog smelled", "dog smelled like", "smelled like a", "like a
skunk" and "a skunk #".
How to predict the next word?
• Assume a language has T word types in its lexicon, how likely is word
x to follow word y?
• Solutions:
• Estimate likelihood of x occurring in new text, based on its general frequency
of occurrence estimated from a corpus
• popcorn is more likely to occur than unicorn
• Condition the likelihood of x occurring in the context of previous words
(bigrams, trigrams,…)
• mythical unicorn is more likely than mythical popcorn
Statistical View
• The task of predicting the next word can be stated as:
• attempting to estimate the probability function P:
• In other words:
• we use a classification of the previous history words, to predict the next word.
• On the basis of having looked at a lot of text, we know which words tend to
follow other words.
N-Gram Models
• Models Sequences using the Statistical Properties of N-Grams
• Idea: Shannon
• given a sequence of letters, what is the likelihood of the next letter?
• From training data, derive a probability distribution for the next letter given a
history of size n.
• Markov assumption: the probability of the next word depends only
on the previous k words.
• Common N-grams:
Using N-Grams
• For N-gram models
• P(w1,w2, ...,wn)
• By the Chain Rule we can decompose a joint probability, e.g. P(w1,w2,w3) as
follows:
P(w1,w2, ...,wn) = P(w1|w2,w3,...,wn) P(w2|w3, ...,wn) … P(wn-1|wn) P(wn)




n
k
wk
wk
P
wn
P
1
1
1
)
|
(
)
1
(
Example
• Compute the product of component conditional probabilities?
• P(the mythical unicorn) = P(the) * P(mythical | the) * P(unicorn|the mythical)
• How to calculate these probability?
Training and Testing
• N-Gram probabilities come from a training corpus
• overly narrow corpus: probabilities don't generalize
• overly general corpus: probabilities don't reflect task or domain
• A separate test corpus is used to evaluate the model
A Simple Bigram Example
• Estimate the likelihood of the sentence:
I want to eat Chinese food.
• P(I want to eat Chinese food) = P(I | <start>) P(want | I) P(to | want) P(eat |
to) P(Chinese | eat) P(food | Chinese) P(<end>|food)
• What do we need to calculate these likelihoods?
• Bigram probabilities for each word pair sequence in the sentence
• Calculated from a large corpus
Early Bigram Probabilities from BERP
The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system is being developed
by the International Computer Science Institute in Berkeley, CA
• P(I want to eat British food) =
P(I | <start>) P(want | I) P(to | want) P(eat | to)
P(British | eat) P(food | British) =
.25*.32*.65*.26*.001*.60 = .000080
• N-gram models can be trained by counting and normalization
References
• https://web.cs.hacettepe.edu.tr/~ilyas/Courses/CMP711/lec03-
LanguageModels.pdf

More Related Content

What's hot

Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processingHareem Naz
 

What's hot (20)

Nlp
NlpNlp
Nlp
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Word embedding
Word embedding Word embedding
Word embedding
 
Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Machine translation
Machine translationMachine translation
Machine translation
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Nlp
NlpNlp
Nlp
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processing
 

Similar to Language Model (N-Gram).pptx

12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdfRajMani28
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdfykyog
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sagar Ahire
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxRama Irsheidat
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...NAIST Machine Translation Study Group
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prologsaru40
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNetDomyoung Lee
 
Vocabulary and grammar teaching through Morphological Awareners
Vocabulary and grammar teaching through  Morphological AwarenersVocabulary and grammar teaching through  Morphological Awareners
Vocabulary and grammar teaching through Morphological AwarenersAhmad Mashhood
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprologFraboni Ec
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prologDavid Hoen
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprologYoung Alista
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prologJames Wong
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprologTony Nguyen
 

Similar to Language Model (N-Gram).pptx (20)

12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Vocabulary and grammar teaching through Morphological Awareners
Vocabulary and grammar teaching through  Morphological AwarenersVocabulary and grammar teaching through  Morphological Awareners
Vocabulary and grammar teaching through Morphological Awareners
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 

Recently uploaded

Call Girls in Islamabad | 03274100048 | Call Girl Service
Call Girls in Islamabad | 03274100048 | Call Girl ServiceCall Girls in Islamabad | 03274100048 | Call Girl Service
Call Girls in Islamabad | 03274100048 | Call Girl ServiceAyesha Khan
 
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...akbard9823
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...akbard9823
 
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubai
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur DubaiBur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubai
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubaidajasot375
 
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad EscortsCall girls in Ahmedabad High profile
 
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...wdefrd
 
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service AkolaAkola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service Akolasrsj9000
 
San Jon Motel, Motel/Residence, San Jon NM
San Jon Motel, Motel/Residence, San Jon NMSan Jon Motel, Motel/Residence, San Jon NM
San Jon Motel, Motel/Residence, San Jon NMroute66connected
 
FULL ENJOY - 9953040155 Call Girls in Noida | Delhi
FULL ENJOY - 9953040155 Call Girls in Noida | DelhiFULL ENJOY - 9953040155 Call Girls in Noida | Delhi
FULL ENJOY - 9953040155 Call Girls in Noida | DelhiMalviyaNagarCallGirl
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comthephillipta
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxLauraFagan6
 
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiFULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiMalviyaNagarCallGirl
 
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiFULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiMalviyaNagarCallGirl
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)thephillipta
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineShivna Prakashan
 
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call GirlsMandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiMalviyaNagarCallGirl
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnsonthephillipta
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...akbard9823
 

Recently uploaded (20)

Call Girls in Islamabad | 03274100048 | Call Girl Service
Call Girls in Islamabad | 03274100048 | Call Girl ServiceCall Girls in Islamabad | 03274100048 | Call Girl Service
Call Girls in Islamabad | 03274100048 | Call Girl Service
 
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...
Hazratganj ] (Call Girls) in Lucknow - 450+ Call Girl Cash Payment 🧄 89231135...
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
 
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubai
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur DubaiBur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubai
Bur Dubai Call Girls O58993O4O2 Call Girls in Bur Dubai
 
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts
(NEHA) Call Girls Ahmedabad Booking Open 8617697112 Ahmedabad Escorts
 
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
 
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service AkolaAkola Call Girls #9907093804 Contact Number Escorts Service Akola
Akola Call Girls #9907093804 Contact Number Escorts Service Akola
 
San Jon Motel, Motel/Residence, San Jon NM
San Jon Motel, Motel/Residence, San Jon NMSan Jon Motel, Motel/Residence, San Jon NM
San Jon Motel, Motel/Residence, San Jon NM
 
FULL ENJOY - 9953040155 Call Girls in Noida | Delhi
FULL ENJOY - 9953040155 Call Girls in Noida | DelhiFULL ENJOY - 9953040155 Call Girls in Noida | Delhi
FULL ENJOY - 9953040155 Call Girls in Noida | Delhi
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
 
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptx
 
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | DelhiFULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
FULL ENJOY - 9953040155 Call Girls in Shaheen Bagh | Delhi
 
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | DelhiFULL ENJOY - 9953040155 Call Girls in Burari | Delhi
FULL ENJOY - 9953040155 Call Girls in Burari | Delhi
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
 
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call GirlsMandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnson
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
 

Language Model (N-Gram).pptx

  • 1. Language Model (N-Gram) Suriandi Jayakrista S.T.,M.T. Kalbis Institute
  • 2. N-gram • Model probabilistik N-gram, merupakan model yang digunakan untuk memprediksi kata berikutnya yang mungkin dari kata N-1 sebelumnya. Model statistika dari urutan kata ini seringkali disebut juga sebagai model bahasa (language models / LMs). • Model estimasi seperti N-gram memberikan probabilitas kemungkinan pada kata berikutnya yang mungkin dapat digunakan untuk melakukan kemungkinan penggabungan pada keseluruhan kalimat. Model N- gram merupakan model yang paling penting dalam pemrosesan suara ataupun bahasa baik untuk memperkirakan probabilitas kata berikutnya maupun keseluruhan sequence.
  • 3. • N-Grams ini akan mengambil sejumlah kata yang kita butuhkan dari sebuah kalimat. Seperti contohnya kalimat “Fahmi adalah anak yang rajin”, maka bentuk N-Grams dari kalimat tersebut adalah : 1.Unigram : Fahmi, adalah, anak, yang, rajin 2.Bigram : Fahmi adalah, adalah anak, anak yang, yang rajin 3.Trigram : Fahmi adalah anak, anak yang rajin 4.dst.
  • 4. What is an N-gram? • An n-gram is a subsequence of n items from a given sequence. • Unigram: n-gram of size 1 • Bigram: n-gram of size 2 • Trigram: n-gram of size 3 • Item: • Phonemes • Syllables • Letters • Words • Anything else depending on the application.
  • 5. Example the dog smelled like a skunk • Bigram: • "# the”, “the dog”, “dog smelled”, “ smelled like”, “like a”, “a skunk”, “skunk#” • Trigram: • "# the dog", "the dog smelled", "dog smelled like", "smelled like a", "like a skunk" and "a skunk #".
  • 6. How to predict the next word? • Assume a language has T word types in its lexicon, how likely is word x to follow word y? • Solutions: • Estimate likelihood of x occurring in new text, based on its general frequency of occurrence estimated from a corpus • popcorn is more likely to occur than unicorn • Condition the likelihood of x occurring in the context of previous words (bigrams, trigrams,…) • mythical unicorn is more likely than mythical popcorn
  • 7. Statistical View • The task of predicting the next word can be stated as: • attempting to estimate the probability function P: • In other words: • we use a classification of the previous history words, to predict the next word. • On the basis of having looked at a lot of text, we know which words tend to follow other words.
  • 8. N-Gram Models • Models Sequences using the Statistical Properties of N-Grams • Idea: Shannon • given a sequence of letters, what is the likelihood of the next letter? • From training data, derive a probability distribution for the next letter given a history of size n.
  • 9. • Markov assumption: the probability of the next word depends only on the previous k words. • Common N-grams:
  • 10. Using N-Grams • For N-gram models • P(w1,w2, ...,wn) • By the Chain Rule we can decompose a joint probability, e.g. P(w1,w2,w3) as follows: P(w1,w2, ...,wn) = P(w1|w2,w3,...,wn) P(w2|w3, ...,wn) … P(wn-1|wn) P(wn)     n k wk wk P wn P 1 1 1 ) | ( ) 1 (
  • 11. Example • Compute the product of component conditional probabilities? • P(the mythical unicorn) = P(the) * P(mythical | the) * P(unicorn|the mythical) • How to calculate these probability?
  • 12. Training and Testing • N-Gram probabilities come from a training corpus • overly narrow corpus: probabilities don't generalize • overly general corpus: probabilities don't reflect task or domain • A separate test corpus is used to evaluate the model
  • 13. A Simple Bigram Example • Estimate the likelihood of the sentence: I want to eat Chinese food. • P(I want to eat Chinese food) = P(I | <start>) P(want | I) P(to | want) P(eat | to) P(Chinese | eat) P(food | Chinese) P(<end>|food) • What do we need to calculate these likelihoods? • Bigram probabilities for each word pair sequence in the sentence • Calculated from a large corpus
  • 14. Early Bigram Probabilities from BERP The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system is being developed by the International Computer Science Institute in Berkeley, CA
  • 15.
  • 16. • P(I want to eat British food) = P(I | <start>) P(want | I) P(to | want) P(eat | to) P(British | eat) P(food | British) = .25*.32*.65*.26*.001*.60 = .000080 • N-gram models can be trained by counting and normalization

Editor's Notes

  1. The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system s being developed by the International Computer Science Institute in Berkeley, CA http://www.icsi.berkeley.edu/real/berp.html