SlideShare a Scribd company logo
NEXT WORD
PREDICTION
KHUSHI SOLANKI 91800151036
UMMEHANI VAGHELA 91800151019
UNDER GUIDANCE OF:
Prof. KAUSHA MASANI
ABSTRACT
The next word prediction model built will exactly
perform. The model will consider the last word of a
particular sentence and predict the next possible
word. We will be using methods of natural language
processing, language modelling, and deep learning.
We will start by analysing the data followed by the
pre-processing of the data. We will then tokenize this
data and finally build the deep learning model. The
deep learning model will be built using LSTM’s.
INTRODUCTION
In this Project will cover what the next word prediction model
built will exactly perform. The model will consider the last
word of a particular sentence and predict the next possible
word. We will be using methods of natural language
processing(NLP), language modelling, and deep learning. We
will start by analysing the data followed by the pre-processing
of the data. We will then tokenize this data and finally build
the deep learning model. The deep learning model will be built
using LSTM’s.
WORD PREDICTION
• What are likely completions of the following sentences?
• “Oh, that must be a syntax …”
• “I have to go to the …”
• “I’d also like a Coke and some …”
• Word prediction (guessing what might come next) has many
applications:
• speech recognition
• handwriting recognition
• augmentative communication systems
• spelling error detection/correction
N-GRAMS
• A simple model for word prediction is the n-gram
model.
• An n-gram model “uses the previous N-1 words to
predict the next one”.
USING CORPORA IN WORD PREDICTION
• A corpus is a collection of text (or speech).
• In order to do word prediction using n-grams
we must compute conditional word
probabilities.
• Conditional word probabilities are computed
from their occurrences in a corpus or several
corpora.
COUNTING TYPES AND TOKENS IN CORPORA
• Prior to computing conditional probabilities, counts are
needed.
• Most counts are based on word form (i.e. cat and cats
are two distinct word forms).
• We distinguish types from tokens:
• “We use types to mean the number of distinct words
in a corpus”
• We use “tokens to mean the total number of running
words.” [pg. 195]
WORD(unigram) PROBABLITIES
• Simple model: all words have equal probability. For example,
assuming a lexicon of 100,000 words we have P(the)=0.00001,
P(rabbit) = 0.00001.
• Consider the difference between “the” and “rabbit” in the
following contexts:
• I bought …
• I read …
UNIGRAM
• Somewhat better model: consider frequency of
occurrence. For example, we might have P(the) = 0.07,
P(rabbit)=0.00001.
• Consider difference between
• Anyhow, …
• Just then, the white …
BIGRAM
• Bigrams consider two-word sequences.
• P(rabbit | white) = …high…
• P(the | white) = …low…
• P(rabbit | anyhow) = …low…
• P(the | anyhow) = …high…
• The probability of occurrence of a word is
dependent on the context of occurrence: the
probability is conditional.
PROBABLITIES
• We want to know the probability of a word given a preceding
context.
• We can compute the probability of a word sequence:
P(w1w2…wn)
• How do we compute the probability that wn occurs after the
sequence (w1w2…wn-1)?
• P(w1w2…wn) = P(w1)P(w2|w1)P(w3|w1w2) …
P(wn|w1w2…wn-1)
• How do we find probabilities like P(wn|w1w2…wn-1)?
• We approximate! In a bigram model we approximate using the
previous one word; approximate by P(wn|wn-1).
COMPUTING CONDITIONAL PROBABLITIES
• How do we compute P(wn|wn-1)?
• P(wn|wn-1) = C(wn-1wn) / ∑w C(wn-1w)
• Since the sum of all bigram counts that
start with wn-1 must be the same as the
unigram count for wn-1, we can simplify:
• P(wn|wn-1) = C(wn-1wn) / C(wn-1)
UML DIAGRAM
THANK YOU!!

More Related Content

What's hot

Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
ANIRUDHMALODE2
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
Kalluri Madhuri
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Abash shah
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
Nlp
NlpNlp
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Hansi Thenuwara
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
KarenVacca
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Rishikese MR
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
Bilgin Aksoy
 

What's hot (20)

Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Nlp
NlpNlp
Nlp
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 

Similar to Next word Prediction

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
ykyog
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Apurva Mittal
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
Ramya Nellutla
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
Lakshya Sivaramakrishnan
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
Insoo Chung
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
Domyoung Lee
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
Osebe Sammi
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
Anuj Gupta
 

Similar to Next word Prediction (20)

lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Language models
Language modelsLanguage models
Language models
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Next word Prediction

  • 1. NEXT WORD PREDICTION KHUSHI SOLANKI 91800151036 UMMEHANI VAGHELA 91800151019 UNDER GUIDANCE OF: Prof. KAUSHA MASANI
  • 2. ABSTRACT The next word prediction model built will exactly perform. The model will consider the last word of a particular sentence and predict the next possible word. We will be using methods of natural language processing, language modelling, and deep learning. We will start by analysing the data followed by the pre-processing of the data. We will then tokenize this data and finally build the deep learning model. The deep learning model will be built using LSTM’s.
  • 3. INTRODUCTION In this Project will cover what the next word prediction model built will exactly perform. The model will consider the last word of a particular sentence and predict the next possible word. We will be using methods of natural language processing(NLP), language modelling, and deep learning. We will start by analysing the data followed by the pre-processing of the data. We will then tokenize this data and finally build the deep learning model. The deep learning model will be built using LSTM’s.
  • 4. WORD PREDICTION • What are likely completions of the following sentences? • “Oh, that must be a syntax …” • “I have to go to the …” • “I’d also like a Coke and some …” • Word prediction (guessing what might come next) has many applications: • speech recognition • handwriting recognition • augmentative communication systems • spelling error detection/correction
  • 5. N-GRAMS • A simple model for word prediction is the n-gram model. • An n-gram model “uses the previous N-1 words to predict the next one”.
  • 6. USING CORPORA IN WORD PREDICTION • A corpus is a collection of text (or speech). • In order to do word prediction using n-grams we must compute conditional word probabilities. • Conditional word probabilities are computed from their occurrences in a corpus or several corpora.
  • 7. COUNTING TYPES AND TOKENS IN CORPORA • Prior to computing conditional probabilities, counts are needed. • Most counts are based on word form (i.e. cat and cats are two distinct word forms). • We distinguish types from tokens: • “We use types to mean the number of distinct words in a corpus” • We use “tokens to mean the total number of running words.” [pg. 195]
  • 8. WORD(unigram) PROBABLITIES • Simple model: all words have equal probability. For example, assuming a lexicon of 100,000 words we have P(the)=0.00001, P(rabbit) = 0.00001. • Consider the difference between “the” and “rabbit” in the following contexts: • I bought … • I read …
  • 9. UNIGRAM • Somewhat better model: consider frequency of occurrence. For example, we might have P(the) = 0.07, P(rabbit)=0.00001. • Consider difference between • Anyhow, … • Just then, the white …
  • 10. BIGRAM • Bigrams consider two-word sequences. • P(rabbit | white) = …high… • P(the | white) = …low… • P(rabbit | anyhow) = …low… • P(the | anyhow) = …high… • The probability of occurrence of a word is dependent on the context of occurrence: the probability is conditional.
  • 11. PROBABLITIES • We want to know the probability of a word given a preceding context. • We can compute the probability of a word sequence: P(w1w2…wn) • How do we compute the probability that wn occurs after the sequence (w1w2…wn-1)? • P(w1w2…wn) = P(w1)P(w2|w1)P(w3|w1w2) … P(wn|w1w2…wn-1) • How do we find probabilities like P(wn|w1w2…wn-1)? • We approximate! In a bigram model we approximate using the previous one word; approximate by P(wn|wn-1).
  • 12. COMPUTING CONDITIONAL PROBABLITIES • How do we compute P(wn|wn-1)? • P(wn|wn-1) = C(wn-1wn) / ∑w C(wn-1w) • Since the sum of all bigram counts that start with wn-1 must be the same as the unigram count for wn-1, we can simplify: • P(wn|wn-1) = C(wn-1wn) / C(wn-1)