NEXT WORD
PREDICTION
KHUSHI SOLANKI 91800151036
UMMEHANI VAGHELA 91800151019
UNDER GUIDANCE OF:
Prof. KAUSHA MASANI
ABSTRACT
The next word prediction model built will exactly
perform. The model will consider the last word of a
particular sentence and predict the next possible
word. We will be using methods of natural language
processing, language modelling, and deep learning.
We will start by analysing the data followed by the
pre-processing of the data. We will then tokenize this
data and finally build the deep learning model. The
deep learning model will be built using LSTM’s.
INTRODUCTION
In this Project will cover what the next word prediction model
built will exactly perform. The model will consider the last
word of a particular sentence and predict the next possible
word. We will be using methods of natural language
processing(NLP), language modelling, and deep learning. We
will start by analysing the data followed by the pre-processing
of the data. We will then tokenize this data and finally build
the deep learning model. The deep learning model will be built
using LSTM’s.
WORD PREDICTION
• What are likely completions of the following sentences?
• “Oh, that must be a syntax …”
• “I have to go to the …”
• “I’d also like a Coke and some …”
• Word prediction (guessing what might come next) has many
applications:
• speech recognition
• handwriting recognition
• augmentative communication systems
• spelling error detection/correction
N-GRAMS
• A simple model for word prediction is the n-gram
model.
• An n-gram model “uses the previous N-1 words to
predict the next one”.
USING CORPORA IN WORD PREDICTION
• A corpus is a collection of text (or speech).
• In order to do word prediction using n-grams
we must compute conditional word
probabilities.
• Conditional word probabilities are computed
from their occurrences in a corpus or several
corpora.
COUNTING TYPES AND TOKENS IN CORPORA
• Prior to computing conditional probabilities, counts are
needed.
• Most counts are based on word form (i.e. cat and cats
are two distinct word forms).
• We distinguish types from tokens:
• “We use types to mean the number of distinct words
in a corpus”
• We use “tokens to mean the total number of running
words.” [pg. 195]
WORD(unigram) PROBABLITIES
• Simple model: all words have equal probability. For example,
assuming a lexicon of 100,000 words we have P(the)=0.00001,
P(rabbit) = 0.00001.
• Consider the difference between “the” and “rabbit” in the
following contexts:
• I bought …
• I read …
UNIGRAM
• Somewhat better model: consider frequency of
occurrence. For example, we might have P(the) = 0.07,
P(rabbit)=0.00001.
• Consider difference between
• Anyhow, …
• Just then, the white …
BIGRAM
• Bigrams consider two-word sequences.
• P(rabbit | white) = …high…
• P(the | white) = …low…
• P(rabbit | anyhow) = …low…
• P(the | anyhow) = …high…
• The probability of occurrence of a word is
dependent on the context of occurrence: the
probability is conditional.
PROBABLITIES
• We want to know the probability of a word given a preceding
context.
• We can compute the probability of a word sequence:
P(w1w2…wn)
• How do we compute the probability that wn occurs after the
sequence (w1w2…wn-1)?
• P(w1w2…wn) = P(w1)P(w2|w1)P(w3|w1w2) …
P(wn|w1w2…wn-1)
• How do we find probabilities like P(wn|w1w2…wn-1)?
• We approximate! In a bigram model we approximate using the
previous one word; approximate by P(wn|wn-1).
COMPUTING CONDITIONAL PROBABLITIES
• How do we compute P(wn|wn-1)?
• P(wn|wn-1) = C(wn-1wn) / ∑w C(wn-1w)
• Since the sum of all bigram counts that
start with wn-1 must be the same as the
unigram count for wn-1, we can simplify:
• P(wn|wn-1) = C(wn-1wn) / C(wn-1)
UML DIAGRAM
THANK YOU!!

Next word Prediction

  • 1.
    NEXT WORD PREDICTION KHUSHI SOLANKI91800151036 UMMEHANI VAGHELA 91800151019 UNDER GUIDANCE OF: Prof. KAUSHA MASANI
  • 2.
    ABSTRACT The next wordprediction model built will exactly perform. The model will consider the last word of a particular sentence and predict the next possible word. We will be using methods of natural language processing, language modelling, and deep learning. We will start by analysing the data followed by the pre-processing of the data. We will then tokenize this data and finally build the deep learning model. The deep learning model will be built using LSTM’s.
  • 3.
    INTRODUCTION In this Projectwill cover what the next word prediction model built will exactly perform. The model will consider the last word of a particular sentence and predict the next possible word. We will be using methods of natural language processing(NLP), language modelling, and deep learning. We will start by analysing the data followed by the pre-processing of the data. We will then tokenize this data and finally build the deep learning model. The deep learning model will be built using LSTM’s.
  • 4.
    WORD PREDICTION • Whatare likely completions of the following sentences? • “Oh, that must be a syntax …” • “I have to go to the …” • “I’d also like a Coke and some …” • Word prediction (guessing what might come next) has many applications: • speech recognition • handwriting recognition • augmentative communication systems • spelling error detection/correction
  • 5.
    N-GRAMS • A simplemodel for word prediction is the n-gram model. • An n-gram model “uses the previous N-1 words to predict the next one”.
  • 6.
    USING CORPORA INWORD PREDICTION • A corpus is a collection of text (or speech). • In order to do word prediction using n-grams we must compute conditional word probabilities. • Conditional word probabilities are computed from their occurrences in a corpus or several corpora.
  • 7.
    COUNTING TYPES ANDTOKENS IN CORPORA • Prior to computing conditional probabilities, counts are needed. • Most counts are based on word form (i.e. cat and cats are two distinct word forms). • We distinguish types from tokens: • “We use types to mean the number of distinct words in a corpus” • We use “tokens to mean the total number of running words.” [pg. 195]
  • 8.
    WORD(unigram) PROBABLITIES • Simplemodel: all words have equal probability. For example, assuming a lexicon of 100,000 words we have P(the)=0.00001, P(rabbit) = 0.00001. • Consider the difference between “the” and “rabbit” in the following contexts: • I bought … • I read …
  • 9.
    UNIGRAM • Somewhat bettermodel: consider frequency of occurrence. For example, we might have P(the) = 0.07, P(rabbit)=0.00001. • Consider difference between • Anyhow, … • Just then, the white …
  • 10.
    BIGRAM • Bigrams considertwo-word sequences. • P(rabbit | white) = …high… • P(the | white) = …low… • P(rabbit | anyhow) = …low… • P(the | anyhow) = …high… • The probability of occurrence of a word is dependent on the context of occurrence: the probability is conditional.
  • 11.
    PROBABLITIES • We wantto know the probability of a word given a preceding context. • We can compute the probability of a word sequence: P(w1w2…wn) • How do we compute the probability that wn occurs after the sequence (w1w2…wn-1)? • P(w1w2…wn) = P(w1)P(w2|w1)P(w3|w1w2) … P(wn|w1w2…wn-1) • How do we find probabilities like P(wn|w1w2…wn-1)? • We approximate! In a bigram model we approximate using the previous one word; approximate by P(wn|wn-1).
  • 12.
    COMPUTING CONDITIONAL PROBABLITIES •How do we compute P(wn|wn-1)? • P(wn|wn-1) = C(wn-1wn) / ∑w C(wn-1w) • Since the sum of all bigram counts that start with wn-1 must be the same as the unigram count for wn-1, we can simplify: • P(wn|wn-1) = C(wn-1wn) / C(wn-1)
  • 13.
  • 14.