Next word Prediction

NEXT WORD
PREDICTION
KHUSHI SOLANKI 91800151036
UMMEHANI VAGHELA 91800151019
UNDER GUIDANCE OF:
Prof. KAUSHA MASANI

ABSTRACT
The next word prediction model built will exactly
perform. The model will consider the last word of a
particular sentence and predict the next possible
word. We will be using methods of natural language
processing, language modelling, and deep learning.
We will start by analysing the data followed by the
pre-processing of the data. We will then tokenize this
data and finally build the deep learning model. The
deep learning model will be built using LSTM’s.

INTRODUCTION
In this Project will cover what the next word prediction model
built will exactly perform. The model will consider the last
word of a particular sentence and predict the next possible
word. We will be using methods of natural language
processing(NLP), language modelling, and deep learning. We
will start by analysing the data followed by the pre-processing
of the data. We will then tokenize this data and finally build
the deep learning model. The deep learning model will be built
using LSTM’s.

WORD PREDICTION
• What are likely completions of the following sentences?
• “Oh, that must be a syntax …”
• “I have to go to the …”
• “I’d also like a Coke and some …”
• Word prediction (guessing what might come next) has many
applications:
• speech recognition
• handwriting recognition
• augmentative communication systems
• spelling error detection/correction

N-GRAMS
• A simple model for word prediction is the n-gram
model.
• An n-gram model “uses the previous N-1 words to
predict the next one”.

USING CORPORA IN WORD PREDICTION
• A corpus is a collection of text (or speech).
• In order to do word prediction using n-grams
we must compute conditional word
probabilities.
• Conditional word probabilities are computed
from their occurrences in a corpus or several
corpora.

COUNTING TYPES AND TOKENS IN CORPORA
• Prior to computing conditional probabilities, counts are
needed.
• Most counts are based on word form (i.e. cat and cats
are two distinct word forms).
• We distinguish types from tokens:
• “We use types to mean the number of distinct words
in a corpus”
• We use “tokens to mean the total number of running
words.” [pg. 195]

WORD(unigram) PROBABLITIES
• Simple model: all words have equal probability. For example,
assuming a lexicon of 100,000 words we have P(the)=0.00001,
P(rabbit) = 0.00001.
• Consider the difference between “the” and “rabbit” in the
following contexts:
• I bought …
• I read …

UNIGRAM
• Somewhat better model: consider frequency of
occurrence. For example, we might have P(the) = 0.07,
P(rabbit)=0.00001.
• Consider difference between
• Anyhow, …
• Just then, the white …

BIGRAM
• Bigrams consider two-word sequences.
• P(rabbit | white) = …high…
• P(the | white) = …low…
• P(rabbit | anyhow) = …low…
• P(the | anyhow) = …high…
• The probability of occurrence of a word is
dependent on the context of occurrence: the
probability is conditional.

PROBABLITIES
• We want to know the probability of a word given a preceding
context.
• We can compute the probability of a word sequence:
P(w1w2…wn)
• How do we compute the probability that wn occurs after the
sequence (w1w2…wn-1)?
• P(w1w2…wn) = P(w1)P(w2|w1)P(w3|w1w2) …
P(wn|w1w2…wn-1)
• How do we find probabilities like P(wn|w1w2…wn-1)?
• We approximate! In a bigram model we approximate using the
previous one word; approximate by P(wn|wn-1).

COMPUTING CONDITIONAL PROBABLITIES
• How do we compute P(wn|wn-1)?
• P(wn|wn-1) = C(wn-1wn) / ∑w C(wn-1w)
• Since the sum of all bigram counts that
start with wn-1 must be the same as the
unigram count for wn-1, we can simplify:
• P(wn|wn-1) = C(wn-1wn) / C(wn-1)

Next word Prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Next word Prediction

Similar to Next word Prediction (20)

Recently uploaded

Recently uploaded (20)

Next word Prediction