SlideShare a Scribd company logo
Language Model (N-Gram)
Suriandi Jayakrista S.T.,M.T.
Kalbis Institute
N-gram
• Model probabilistik N-gram, merupakan model yang digunakan untuk
memprediksi kata berikutnya yang mungkin dari kata N-1 sebelumnya. Model
statistika dari urutan kata ini seringkali disebut juga sebagai model bahasa
(language models / LMs).
• Model estimasi seperti N-gram memberikan probabilitas kemungkinan pada
kata berikutnya yang mungkin dapat digunakan untuk melakukan
kemungkinan penggabungan pada keseluruhan kalimat. Model N-
gram merupakan model yang paling penting dalam pemrosesan suara
ataupun bahasa baik untuk memperkirakan probabilitas kata berikutnya
maupun keseluruhan sequence.
• N-Grams ini akan mengambil sejumlah kata yang kita butuhkan
dari sebuah kalimat. Seperti contohnya kalimat “Fahmi adalah
anak yang rajin”, maka bentuk N-Grams dari kalimat tersebut
adalah :
1.Unigram : Fahmi, adalah, anak, yang, rajin
2.Bigram : Fahmi adalah, adalah anak, anak yang, yang rajin
3.Trigram : Fahmi adalah anak, anak yang rajin
4.dst.
What is an N-gram?
• An n-gram is a subsequence of n items from a given sequence.
• Unigram: n-gram of size 1
• Bigram: n-gram of size 2
• Trigram: n-gram of size 3
• Item:
• Phonemes
• Syllables
• Letters
• Words
• Anything else depending on the application.
Example
the dog smelled like a skunk
• Bigram:
• "# the”, “the dog”, “dog smelled”, “ smelled like”, “like a”, “a skunk”, “skunk#”
• Trigram:
• "# the dog", "the dog smelled", "dog smelled like", "smelled like a", "like a
skunk" and "a skunk #".
How to predict the next word?
• Assume a language has T word types in its lexicon, how likely is word
x to follow word y?
• Solutions:
• Estimate likelihood of x occurring in new text, based on its general frequency
of occurrence estimated from a corpus
• popcorn is more likely to occur than unicorn
• Condition the likelihood of x occurring in the context of previous words
(bigrams, trigrams,…)
• mythical unicorn is more likely than mythical popcorn
Statistical View
• The task of predicting the next word can be stated as:
• attempting to estimate the probability function P:
• In other words:
• we use a classification of the previous history words, to predict the next word.
• On the basis of having looked at a lot of text, we know which words tend to
follow other words.
N-Gram Models
• Models Sequences using the Statistical Properties of N-Grams
• Idea: Shannon
• given a sequence of letters, what is the likelihood of the next letter?
• From training data, derive a probability distribution for the next letter given a
history of size n.
• Markov assumption: the probability of the next word depends only
on the previous k words.
• Common N-grams:
Using N-Grams
• For N-gram models
• P(w1,w2, ...,wn)
• By the Chain Rule we can decompose a joint probability, e.g. P(w1,w2,w3) as
follows:
P(w1,w2, ...,wn) = P(w1|w2,w3,...,wn) P(w2|w3, ...,wn) … P(wn-1|wn) P(wn)




n
k
wk
wk
P
wn
P
1
1
1
)
|
(
)
1
(
Example
• Compute the product of component conditional probabilities?
• P(the mythical unicorn) = P(the) * P(mythical | the) * P(unicorn|the mythical)
• How to calculate these probability?
Training and Testing
• N-Gram probabilities come from a training corpus
• overly narrow corpus: probabilities don't generalize
• overly general corpus: probabilities don't reflect task or domain
• A separate test corpus is used to evaluate the model
A Simple Bigram Example
• Estimate the likelihood of the sentence:
I want to eat Chinese food.
• P(I want to eat Chinese food) = P(I | <start>) P(want | I) P(to | want) P(eat |
to) P(Chinese | eat) P(food | Chinese) P(<end>|food)
• What do we need to calculate these likelihoods?
• Bigram probabilities for each word pair sequence in the sentence
• Calculated from a large corpus
Early Bigram Probabilities from BERP
The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system is being developed
by the International Computer Science Institute in Berkeley, CA
• P(I want to eat British food) =
P(I | <start>) P(want | I) P(to | want) P(eat | to)
P(British | eat) P(food | British) =
.25*.32*.65*.26*.001*.60 = .000080
• N-gram models can be trained by counting and normalization
References
• https://web.cs.hacettepe.edu.tr/~ilyas/Courses/CMP711/lec03-
LanguageModels.pdf

More Related Content

What's hot

NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
Hemantha Kulathilake
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
Rushdi Shams
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
First order logic
First order logicFirst order logic
First order logic
Chinmay Patel
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
Gurram Poorna Prudhvi
 
Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computationprasadmvreddy
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Varunjeet Singh Rekhi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
Bipul Roy Bpl
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 

What's hot (20)

NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
First order logic
First order logicFirst order logic
First order logic
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computation
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word embedding
Word embedding Word embedding
Word embedding
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 

Similar to Language Model (N-Gram).pptx

12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
RajMani28
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
KHUSHISOLANKI3
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
ykyog
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
KhenAguinillo
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Lviv Data Science Summer School
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sagar Ahire
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
Rama Irsheidat
 
languagemodel.ppt.Web Intelligence compu
languagemodel.ppt.Web Intelligence compulanguagemodel.ppt.Web Intelligence compu
languagemodel.ppt.Web Intelligence compu
JohnrayMendoza
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
saru40
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
Domyoung Lee
 
Vocabulary and grammar teaching through Morphological Awareners
Vocabulary and grammar teaching through  Morphological AwarenersVocabulary and grammar teaching through  Morphological Awareners
Vocabulary and grammar teaching through Morphological Awareners
Ahmad Mashhood
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
Harry Potter
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
Luis Goldster
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
Fraboni Ec
 

Similar to Language Model (N-Gram).pptx (20)

12.4.1 n-grams.pdf
12.4.1 n-grams.pdf12.4.1 n-grams.pdf
12.4.1 n-grams.pdf
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
 
languagemodel.ppt.Web Intelligence compu
languagemodel.ppt.Web Intelligence compulanguagemodel.ppt.Web Intelligence compu
languagemodel.ppt.Web Intelligence compu
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Vocabulary and grammar teaching through Morphological Awareners
Vocabulary and grammar teaching through  Morphological AwarenersVocabulary and grammar teaching through  Morphological Awareners
Vocabulary and grammar teaching through Morphological Awareners
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 

Recently uploaded

Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)
CristianMestre
 
2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories
luforfor
 
Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)
SuryaKalyan3
 
一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单
zvaywau
 
A Brief Introduction About Hadj Ounis
A Brief  Introduction  About  Hadj OunisA Brief  Introduction  About  Hadj Ounis
A Brief Introduction About Hadj Ounis
Hadj Ounis
 
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
zvaywau
 
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
luforfor
 
acting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaaacting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaa
angelicafronda7
 
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
iraqartsandculture
 
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERSART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
Sandhya J.Nair
 
IrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptxIrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptx
Aine Greaney Ellrott
 
Fed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine ZorbaFed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine Zorba
mariavlachoupt
 
Codes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new newCodes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new new
ZackSpencer3
 
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
zeyhe
 
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
taqyed
 
ashokathegreat project class 12 presentation
ashokathegreat project class 12 presentationashokathegreat project class 12 presentation
ashokathegreat project class 12 presentation
aditiyad2020
 
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
zeyhe
 
Caffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire WilsonCaffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire Wilson
ClaireWilson398082
 
Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)
SuryaKalyan3
 

Recently uploaded (19)

Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)
 
2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories
 
Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)
 
一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单
 
A Brief Introduction About Hadj Ounis
A Brief  Introduction  About  Hadj OunisA Brief  Introduction  About  Hadj Ounis
A Brief Introduction About Hadj Ounis
 
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
 
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
 
acting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaaacting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaa
 
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
 
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERSART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
 
IrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptxIrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptx
 
Fed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine ZorbaFed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine Zorba
 
Codes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new newCodes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new new
 
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
一比一原版(QUT毕业证)昆士兰科技大学毕业证成绩单如何办理
 
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
 
ashokathegreat project class 12 presentation
ashokathegreat project class 12 presentationashokathegreat project class 12 presentation
ashokathegreat project class 12 presentation
 
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
 
Caffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire WilsonCaffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire Wilson
 
Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)
 

Language Model (N-Gram).pptx

  • 1. Language Model (N-Gram) Suriandi Jayakrista S.T.,M.T. Kalbis Institute
  • 2. N-gram • Model probabilistik N-gram, merupakan model yang digunakan untuk memprediksi kata berikutnya yang mungkin dari kata N-1 sebelumnya. Model statistika dari urutan kata ini seringkali disebut juga sebagai model bahasa (language models / LMs). • Model estimasi seperti N-gram memberikan probabilitas kemungkinan pada kata berikutnya yang mungkin dapat digunakan untuk melakukan kemungkinan penggabungan pada keseluruhan kalimat. Model N- gram merupakan model yang paling penting dalam pemrosesan suara ataupun bahasa baik untuk memperkirakan probabilitas kata berikutnya maupun keseluruhan sequence.
  • 3. • N-Grams ini akan mengambil sejumlah kata yang kita butuhkan dari sebuah kalimat. Seperti contohnya kalimat “Fahmi adalah anak yang rajin”, maka bentuk N-Grams dari kalimat tersebut adalah : 1.Unigram : Fahmi, adalah, anak, yang, rajin 2.Bigram : Fahmi adalah, adalah anak, anak yang, yang rajin 3.Trigram : Fahmi adalah anak, anak yang rajin 4.dst.
  • 4. What is an N-gram? • An n-gram is a subsequence of n items from a given sequence. • Unigram: n-gram of size 1 • Bigram: n-gram of size 2 • Trigram: n-gram of size 3 • Item: • Phonemes • Syllables • Letters • Words • Anything else depending on the application.
  • 5. Example the dog smelled like a skunk • Bigram: • "# the”, “the dog”, “dog smelled”, “ smelled like”, “like a”, “a skunk”, “skunk#” • Trigram: • "# the dog", "the dog smelled", "dog smelled like", "smelled like a", "like a skunk" and "a skunk #".
  • 6. How to predict the next word? • Assume a language has T word types in its lexicon, how likely is word x to follow word y? • Solutions: • Estimate likelihood of x occurring in new text, based on its general frequency of occurrence estimated from a corpus • popcorn is more likely to occur than unicorn • Condition the likelihood of x occurring in the context of previous words (bigrams, trigrams,…) • mythical unicorn is more likely than mythical popcorn
  • 7. Statistical View • The task of predicting the next word can be stated as: • attempting to estimate the probability function P: • In other words: • we use a classification of the previous history words, to predict the next word. • On the basis of having looked at a lot of text, we know which words tend to follow other words.
  • 8. N-Gram Models • Models Sequences using the Statistical Properties of N-Grams • Idea: Shannon • given a sequence of letters, what is the likelihood of the next letter? • From training data, derive a probability distribution for the next letter given a history of size n.
  • 9. • Markov assumption: the probability of the next word depends only on the previous k words. • Common N-grams:
  • 10. Using N-Grams • For N-gram models • P(w1,w2, ...,wn) • By the Chain Rule we can decompose a joint probability, e.g. P(w1,w2,w3) as follows: P(w1,w2, ...,wn) = P(w1|w2,w3,...,wn) P(w2|w3, ...,wn) … P(wn-1|wn) P(wn)     n k wk wk P wn P 1 1 1 ) | ( ) 1 (
  • 11. Example • Compute the product of component conditional probabilities? • P(the mythical unicorn) = P(the) * P(mythical | the) * P(unicorn|the mythical) • How to calculate these probability?
  • 12. Training and Testing • N-Gram probabilities come from a training corpus • overly narrow corpus: probabilities don't generalize • overly general corpus: probabilities don't reflect task or domain • A separate test corpus is used to evaluate the model
  • 13. A Simple Bigram Example • Estimate the likelihood of the sentence: I want to eat Chinese food. • P(I want to eat Chinese food) = P(I | <start>) P(want | I) P(to | want) P(eat | to) P(Chinese | eat) P(food | Chinese) P(<end>|food) • What do we need to calculate these likelihoods? • Bigram probabilities for each word pair sequence in the sentence • Calculated from a large corpus
  • 14. Early Bigram Probabilities from BERP The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system is being developed by the International Computer Science Institute in Berkeley, CA
  • 15.
  • 16. • P(I want to eat British food) = P(I | <start>) P(want | I) P(to | want) P(eat | to) P(British | eat) P(food | British) = .25*.32*.65*.26*.001*.60 = .000080 • N-gram models can be trained by counting and normalization

Editor's Notes

  1. The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system s being developed by the International Computer Science Institute in Berkeley, CA http://www.icsi.berkeley.edu/real/berp.html