SlideShare a Scribd company logo
1 of 57
Download to read offline
Language modelling
and its use cases
What is Grammarly?
What do I do at Grammarly?
1. In the past:
a. word order
b. possessive nouns
c. sentence fragments
d. different types of verb mistakes
e. etc.
2. Now:
a. Paragraph-level checks
What is language modeling?
Models that assign probability to the sequence of words
are called language models or LM.
Applications of language models
Applications of language models
1. Text prediction
Applications of language models
1. Text prediction
2. Speech recognition
Speech recognition
Which one is the most probable?
1. It’s not easy to wreck a nice beach.
2. It’s not easy to recognize speech.
3. It’s not easy to wreck an ice beach.
Applications of language models
1. Text prediction
2. Speech recognition
3. Language identification
Applications of language models
1. Text prediction
2. Speech recognition
3. Language identification
4. Machine translation
Applications of language models
1. Text prediction
2. Speech recognition
3. Language identification
4. Machine translation
5. Handwriting recognition
Applications of language models
1. Text prediction
2. Speech recognition
3. Language identification
4. Machine translation
5. Handwriting recognition
6. Error correction
7. etc.
Statistical language modeling
Language corpora
A corpus is a body of text.
Some popular English and Ukrainian corpora:
- Gutenberg Dataset (en)
- Wikipedia corpus (en)
- UberText (ua)
- your custom corpus
- ...
Assigning a probability to a sentence
Our sentence, s: That monkey made a smart move!
Our corpus is of size N (say, 10 000 sentences).
P(s) = c(s) / N
Assigning a probability to a sentence
Our sentence, s: That monkey made a smart move!
Our corpus is of size N (say, 10 000 sentences).
P(s) = c(s) / N
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) = ?
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) * P(monkey| <s>, That) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) *
* P(a| <s> , That, monkey, made)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) *
* P(a| <s> , That, monkey, made) * P(smart| <s> , That, monkey, made, a) *
* P(move| <s> , That, monkey, made, a, smart) *
* P(!| <s> , That, monkey, made, a, smart, move) *
* P(</s>| <s> , That, monkey, made, a, smart, move, !)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Joint probability: chain rule
P(<s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) *
* P(a| <s> , That, monkey, made) * P(smart| <s> , That, monkey, made, a) *
* P(move| <s> , That, monkey, made, a, smart) *
* P(!| <s> , That, monkey, made, a, smart, move) *
* P(</s>| <s> , That, monkey, made, a, smart, move, !)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|x1
, …, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = ?
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) *
* P(a| monkey, made)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) *
* P(a| monkey, made) * P(smart| made, a)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) *
* P(a| monkey, made) * P(smart| made, a) * P(move| a, smart) *
* P(!| smart, move) * P(</s>|move, !)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Markov assumption
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) *
* P(a| monkey, made) * P(smart| made, a) * P(move| a, smart) *
* P(!| smart, move) * P(</s>|move, !)
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Maximum likelihood estimates
P(made| That, monkey) = c (That, monkey, made) / c (That, monkey)
P(a| monkey, made) = c (monkey, made, a) / c (monkey, made)
...
P(xi
|xi-2
, xi-1
) = c(xi-2
, xi-1
, xi
) / c(xi-2
, xi-1
)
Ngrams
N-gram is a sequence of N words.
The monkey is eating a banana!
Ngrams
N-gram is a sequence of N words.
The monkey is eating a banana!
- unigram: The, monkey, is, eating, a, banana, !
Ngrams
N-gram is a sequence of N words.
The monkey is eating a banana!
- unigram: The, monkey, is, eating, a, banana, !
- bigram: <s> The, The monkey, monkey is, is eating, eating a, a banana,...
Ngrams
N-gram is a sequence of N words.
The monkey is eating a banana!
- unigram: The, monkey, is, eating, a, banana, !
- bigram: <s> The, The monkey, monkey is, is eating, eating a, a banana,...
- trigram: <s> <s> The, <s> The monkey, The monkey is, monkey is eating,...
- ...
Ngrams
From the corpus (of size 50 000) we get:
<s> The 5 678
The monkey 97
monkey is 65
is eating 3 440
eating a 1 675
... ...
<s> <s> The 5 678
<s> The monkey 3
The monkey is 0
monkey is eating 8
is eating a 457
... ...
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
c(<s>, <s>, That) / c(<s>, <s>) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
1189 / 50000 *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
1189 / 50000 * c(<s>, That, monkey) / c(<s>, That) *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
1189 / 50000 * 12 / 1189 *
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
1189 / 50000 * 12 / 1189 * … ≈ 0.0000003305
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Assigning a probability
P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) =
1189 / 50000 * 12 / 1189 * c(That, monkey, made) / c(That, monkey) *
… ≈ 0.0000003305
P(x1
, x2
, …, xn
) = ∏i
P(xi
|xi-2
, xi-1
)
Smoothing techniques
● Add-1 smoothing
● Add-k smoothing
● Backoff
● Interpolation
● Kneser-Ney smoothing
● ...
Statistical LM: challenges
● They do not generalize
○ red car = 2 390, blue car = 1 132, purple car = 0
● Intricate smoothing techniques
○ e.g., fixed backing up order should be designed by hand
● Doesn’t capture long-range dependencies
○ That smart monkey, which I told you about, was also sitting on my car!
● Scaling to larger ngrams is very expensive
○ number of possible n-grams on a vocabulary V is Vn
Neural language modeling
Neural LM
image from http://torch.ch/blog/2016/07/25/nce.html
One-hot encodings
● Sparse vectors of size V (vocabulary)
image from https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
Challenge accepted
● They do not generalize
○ red, blue, and black appear in similar contexts
● Intricate smoothing techniques
○ no need for additional smoothing since we use word vectors and backprop
● Doesn’t capture long-range dependencies
○ That smart monkey, which I told you about, was also sitting on my car!
● Scaling to larger ngrams is very expensive
Neural LM: challenges
● Generalize better :-)
○ brown horse, white horse, green horse ?!?
● Take long time to train
● Very expensive
You try it
● KenLM
○ https://github.com/kpu/kenlm
● Simple RNN language model
○ https://github.com/pytorch/examples/tree/master/word_language_model
● LSTM by Salesforce
○ https://github.com/salesforce/awd-lstm-lm
Let’s have some fun ;-)
from http://karpathy.github.io/2015/05/21/rnn-effectiveness/
● Baby name generation:
○ Alessia, Mareanne, Chrestina, Hi, Saddie
Let’s have some fun ;-)
from http://karpathy.github.io/2015/05/21/rnn-effectiveness/
● Baby name generation:
○ Alessia, Mareanne, Chrestina, Hi, Saddie
● Leo Tolstoy’s War and Peace:
○ "Why do what that day," replied Natasha, and wishing to himself the fact the
princess, Princess Mary was easier, fed in had oftened him.
Pierre aking his soul came to the packs and drove up his father-in-law women.
Let’s have some fun ;-)
from http://karpathy.github.io/2015/05/21/rnn-effectiveness/
● Baby name generation:
○ Alessia, Mareanne, Chrestina, Hi, Saddie
● Leo Tolstoy’s War and Peace:
○ "Why do what that day," replied Natasha, and wishing to himself the fact the
princess, Princess Mary was easier, fed in had oftened him.
Pierre aking his soul came to the packs and drove up his father-in-law women.
● All works of Shakespear:
○ PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.
from http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Linux source code ;-)
khrystyna.skopyk@grammarly.com
Reading list
1. Language Modeling with N-grams, Dan Jurafsky and James H. Martin
2. Course notes for NLP by Michael Collins
3. Smoothing for statistical LM
4. Recurrent Neural Network Tutorial
5. Neural Network Methods for NLP, Yoav Goldberg, chapters 9, 13-15

More Related Content

What's hot

How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
 
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)Jaeyeon Kim
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
Python을 활용한 챗봇 서비스 개발 2일차
Python을 활용한 챗봇 서비스 개발 2일차Python을 활용한 챗봇 서비스 개발 2일차
Python을 활용한 챗봇 서비스 개발 2일차Taekyung Han
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)WON JOON YOO
 
Language Detection Library for Java
Language Detection Library for Java Language Detection Library for Java
Language Detection Library for Java Shuyo Nakatani
 

What's hot (20)

How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Word embedding
Word embedding Word embedding
Word embedding
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Serving ML easily with FastAPI
Serving ML easily with FastAPIServing ML easily with FastAPI
Serving ML easily with FastAPI
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Python을 활용한 챗봇 서비스 개발 2일차
Python을 활용한 챗봇 서비스 개발 2일차Python을 활용한 챗봇 서비스 개발 2일차
Python을 활용한 챗봇 서비스 개발 2일차
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
 
Language Detection Library for Java
Language Detection Library for Java Language Detection Library for Java
Language Detection Library for Java
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Language modelling and its use cases

  • 3. What do I do at Grammarly? 1. In the past: a. word order b. possessive nouns c. sentence fragments d. different types of verb mistakes e. etc. 2. Now: a. Paragraph-level checks
  • 4. What is language modeling? Models that assign probability to the sequence of words are called language models or LM.
  • 6. Applications of language models 1. Text prediction
  • 7. Applications of language models 1. Text prediction 2. Speech recognition
  • 8. Speech recognition Which one is the most probable? 1. It’s not easy to wreck a nice beach. 2. It’s not easy to recognize speech. 3. It’s not easy to wreck an ice beach.
  • 9. Applications of language models 1. Text prediction 2. Speech recognition 3. Language identification
  • 10. Applications of language models 1. Text prediction 2. Speech recognition 3. Language identification 4. Machine translation
  • 11. Applications of language models 1. Text prediction 2. Speech recognition 3. Language identification 4. Machine translation 5. Handwriting recognition
  • 12. Applications of language models 1. Text prediction 2. Speech recognition 3. Language identification 4. Machine translation 5. Handwriting recognition 6. Error correction 7. etc.
  • 14. Language corpora A corpus is a body of text. Some popular English and Ukrainian corpora: - Gutenberg Dataset (en) - Wikipedia corpus (en) - UberText (ua) - your custom corpus - ...
  • 15. Assigning a probability to a sentence Our sentence, s: That monkey made a smart move! Our corpus is of size N (say, 10 000 sentences). P(s) = c(s) / N
  • 16. Assigning a probability to a sentence Our sentence, s: That monkey made a smart move! Our corpus is of size N (say, 10 000 sentences). P(s) = c(s) / N
  • 17. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = ? P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 18. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 19. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(monkey| <s>, That) * P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 20. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) * P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 21. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) * * P(a| <s> , That, monkey, made) P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 22. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) * * P(a| <s> , That, monkey, made) * P(smart| <s> , That, monkey, made, a) * * P(move| <s> , That, monkey, made, a, smart) * * P(!| <s> , That, monkey, made, a, smart, move) * * P(</s>| <s> , That, monkey, made, a, smart, move, !) P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 23. Joint probability: chain rule P(<s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>) * P(monkey| <s>, That) * P(made| <s> ,That, monkey) * * P(a| <s> , That, monkey, made) * P(smart| <s> , That, monkey, made, a) * * P(move| <s> , That, monkey, made, a, smart) * * P(!| <s> , That, monkey, made, a, smart, move) * * P(</s>| <s> , That, monkey, made, a, smart, move, !) P(x1 , x2 , …, xn ) = ∏i P(xi |x1 , …, xi-1 )
  • 24. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = ? P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 25. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 26. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 27. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 28. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) * * P(a| monkey, made) P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 29. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) * * P(a| monkey, made) * P(smart| made, a) P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 30. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) * * P(a| monkey, made) * P(smart| made, a) * P(move| a, smart) * * P(!| smart, move) * P(</s>|move, !) P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 31. Markov assumption P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = P(That| <s>, <s>) * P(monkey| <s>, That) * P(made| That, monkey) * * P(a| monkey, made) * P(smart| made, a) * P(move| a, smart) * * P(!| smart, move) * P(</s>|move, !) P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 32. Maximum likelihood estimates P(made| That, monkey) = c (That, monkey, made) / c (That, monkey) P(a| monkey, made) = c (monkey, made, a) / c (monkey, made) ... P(xi |xi-2 , xi-1 ) = c(xi-2 , xi-1 , xi ) / c(xi-2 , xi-1 )
  • 33. Ngrams N-gram is a sequence of N words. The monkey is eating a banana!
  • 34. Ngrams N-gram is a sequence of N words. The monkey is eating a banana! - unigram: The, monkey, is, eating, a, banana, !
  • 35. Ngrams N-gram is a sequence of N words. The monkey is eating a banana! - unigram: The, monkey, is, eating, a, banana, ! - bigram: <s> The, The monkey, monkey is, is eating, eating a, a banana,...
  • 36. Ngrams N-gram is a sequence of N words. The monkey is eating a banana! - unigram: The, monkey, is, eating, a, banana, ! - bigram: <s> The, The monkey, monkey is, is eating, eating a, a banana,... - trigram: <s> <s> The, <s> The monkey, The monkey is, monkey is eating,... - ...
  • 37. Ngrams From the corpus (of size 50 000) we get: <s> The 5 678 The monkey 97 monkey is 65 is eating 3 440 eating a 1 675 ... ... <s> <s> The 5 678 <s> The monkey 3 The monkey is 0 monkey is eating 8 is eating a 457 ... ...
  • 38. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = c(<s>, <s>, That) / c(<s>, <s>) * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 39. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = 1189 / 50000 * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 40. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = 1189 / 50000 * c(<s>, That, monkey) / c(<s>, That) * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 41. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = 1189 / 50000 * 12 / 1189 * P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 42. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = 1189 / 50000 * 12 / 1189 * … ≈ 0.0000003305 P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 43. Assigning a probability P(<s>, <s>, That, monkey, made, a, smart, move, !, </s>) = 1189 / 50000 * 12 / 1189 * c(That, monkey, made) / c(That, monkey) * … ≈ 0.0000003305 P(x1 , x2 , …, xn ) = ∏i P(xi |xi-2 , xi-1 )
  • 44. Smoothing techniques ● Add-1 smoothing ● Add-k smoothing ● Backoff ● Interpolation ● Kneser-Ney smoothing ● ...
  • 45. Statistical LM: challenges ● They do not generalize ○ red car = 2 390, blue car = 1 132, purple car = 0 ● Intricate smoothing techniques ○ e.g., fixed backing up order should be designed by hand ● Doesn’t capture long-range dependencies ○ That smart monkey, which I told you about, was also sitting on my car! ● Scaling to larger ngrams is very expensive ○ number of possible n-grams on a vocabulary V is Vn
  • 47. Neural LM image from http://torch.ch/blog/2016/07/25/nce.html
  • 48. One-hot encodings ● Sparse vectors of size V (vocabulary) image from https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
  • 49. Challenge accepted ● They do not generalize ○ red, blue, and black appear in similar contexts ● Intricate smoothing techniques ○ no need for additional smoothing since we use word vectors and backprop ● Doesn’t capture long-range dependencies ○ That smart monkey, which I told you about, was also sitting on my car! ● Scaling to larger ngrams is very expensive
  • 50. Neural LM: challenges ● Generalize better :-) ○ brown horse, white horse, green horse ?!? ● Take long time to train ● Very expensive
  • 51. You try it ● KenLM ○ https://github.com/kpu/kenlm ● Simple RNN language model ○ https://github.com/pytorch/examples/tree/master/word_language_model ● LSTM by Salesforce ○ https://github.com/salesforce/awd-lstm-lm
  • 52. Let’s have some fun ;-) from http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ● Baby name generation: ○ Alessia, Mareanne, Chrestina, Hi, Saddie
  • 53. Let’s have some fun ;-) from http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ● Baby name generation: ○ Alessia, Mareanne, Chrestina, Hi, Saddie ● Leo Tolstoy’s War and Peace: ○ "Why do what that day," replied Natasha, and wishing to himself the fact the princess, Princess Mary was easier, fed in had oftened him. Pierre aking his soul came to the packs and drove up his father-in-law women.
  • 54. Let’s have some fun ;-) from http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ● Baby name generation: ○ Alessia, Mareanne, Chrestina, Hi, Saddie ● Leo Tolstoy’s War and Peace: ○ "Why do what that day," replied Natasha, and wishing to himself the fact the princess, Princess Mary was easier, fed in had oftened him. Pierre aking his soul came to the packs and drove up his father-in-law women. ● All works of Shakespear: ○ PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep.
  • 57. Reading list 1. Language Modeling with N-grams, Dan Jurafsky and James H. Martin 2. Course notes for NLP by Michael Collins 3. Smoothing for statistical LM 4. Recurrent Neural Network Tutorial 5. Neural Network Methods for NLP, Yoav Goldberg, chapters 9, 13-15