UNIT 5 AI APPLICATIONS
SYLLABUS
AI applications
 Language Models
 Probabilistic Language Models
 Information Retrieval
 Information Extraction
 Natural Language Processing
 Machine Translation
 Speech Recognition
 Robot
 Hardware – Perception
 Planning – Moving
PROBABILISTIC LANGUAGE
PROCESSING
 corpus-based approach to understand the
language
 A corpus (plural corpora) is a large collection of
text, such as the billions of pages that make up the
World Wide Web.
 The text is written by and for humans,
 The task of the software is to make it easier for the
human to find the right information.
 This approach implies the use of statistics
 Learning to take advantage of the corpus,
 Probabilistic language models that can be learned
from data
PROBABILISTIC
LANGUAGE
MODELS
 learning is just a matter of counting occurrences
 probability can be used to choose the most likely
interpretation
 A probabilistic language model defines a
probability distribution over a (possibly infinite)
set of strings.
 Eg : bigram and trigram language models used in
speech recognition
MODELS
 unigram model assigns a probability P(w) to each
word in the lexicon.
 The model assumes that words are chosen
independently,
 so the probability of a string is just the product of
the probability of its words, given by
Πi P(wi)
 A bigram model assigns a probability
Πi P(wi/wi-1)
to each word, given the previous word
N-GRAM MODEL
 an n-gram model conditions on the previous n - 1
words, assigning a probability
Πi P(wi/wi-(n-1) … wi-1)
 The models themselves agree:
 the model assigns its random string a probability of
 For trigram is10-10 ,
 For bigram is10-29 and
 For unigram is 10-59
SMOOTHING
 pairs will have a count of zero
 We need some way of smoothing over the zero
counts.
 ADD-ONE SMOOTHING :
 The simplest way to do this is called add-one smoothing
 add one to the count of every possible bigram
 So if there are N words in the corpus and. B possible
bigrams, then each
 bigram with an actual count of c is assigned a probability
estimate of
(c + l)/(N + B)
 This method eliminates the problem of zero-probability
n-grams,
 but the assumption that every count should be
incremented by exactly one
LINEAR INTERPOLATION SMOOTHING
 Another approach which combines trigram,
bigram, and unigram models by linear interpolation
 where c3 + c2 + c1= 1
 The parameters ci can be fixed, or they can be
trained with an EM algorithm.
 It is possible to have values of ci that are
dependent on the n-gram counts, so that
 we place a higher weight on the probability
estimates that are derived from higher counts.
VITERBI EQUATION
 It takes as input a unigram word probability
distribution, P(word), and a string.
 Then, for each position i in the string, it stores in
best[i] the probability of the most probable string
spanning from the start up to i.
 It also stores in words[i] the word ending at
position i that yielded the best probability.
 Once it has built up the best and words arrays in a
dynamic programming fashion, it then works
backwards through words to find the best path.
PROBABILISTIC CONTEXT-FREE GRAMMARS
 n-gram models take advantage of co-occurrence
statistics in the corpora, but they have no notion of
grammar at distances greater than n
 An alternative language model is the
PROBABILISTIC CONTEXT-FREE GRAMMAR
(PCFG)
 PCFG,' which consists of a CFG wherein each
rewrite rule has an associated probability.
 The sum of the probabilities across all rules with the
same left-hand side is 1
PROBABILISTIC CONTEXT-FREE GRAMMAR
(PCFG) AND LEXICON
Note:
The numbers in
square brackets
indicate the
probability that a
left-hand-side
symbol will be
rewritten with the
corresponding rule.
PARSE TREE
Probability of a string, P(words), is just the sum of the probabilities of its
parse trees.
The probability of a given tree is the product of the probabilities of all the
rules that make up the nodes of the tree.

Artificial Intelligence

  • 1.
    UNIT 5 AIAPPLICATIONS
  • 2.
    SYLLABUS AI applications  LanguageModels  Probabilistic Language Models  Information Retrieval  Information Extraction  Natural Language Processing  Machine Translation  Speech Recognition  Robot  Hardware – Perception  Planning – Moving
  • 3.
    PROBABILISTIC LANGUAGE PROCESSING  corpus-basedapproach to understand the language  A corpus (plural corpora) is a large collection of text, such as the billions of pages that make up the World Wide Web.  The text is written by and for humans,  The task of the software is to make it easier for the human to find the right information.  This approach implies the use of statistics  Learning to take advantage of the corpus,  Probabilistic language models that can be learned from data
  • 4.
  • 5.
     learning isjust a matter of counting occurrences  probability can be used to choose the most likely interpretation  A probabilistic language model defines a probability distribution over a (possibly infinite) set of strings.  Eg : bigram and trigram language models used in speech recognition
  • 6.
    MODELS  unigram modelassigns a probability P(w) to each word in the lexicon.  The model assumes that words are chosen independently,  so the probability of a string is just the product of the probability of its words, given by Πi P(wi)  A bigram model assigns a probability Πi P(wi/wi-1) to each word, given the previous word
  • 7.
    N-GRAM MODEL  ann-gram model conditions on the previous n - 1 words, assigning a probability Πi P(wi/wi-(n-1) … wi-1)  The models themselves agree:  the model assigns its random string a probability of  For trigram is10-10 ,  For bigram is10-29 and  For unigram is 10-59
  • 8.
    SMOOTHING  pairs willhave a count of zero  We need some way of smoothing over the zero counts.  ADD-ONE SMOOTHING :  The simplest way to do this is called add-one smoothing  add one to the count of every possible bigram  So if there are N words in the corpus and. B possible bigrams, then each  bigram with an actual count of c is assigned a probability estimate of (c + l)/(N + B)  This method eliminates the problem of zero-probability n-grams,  but the assumption that every count should be incremented by exactly one
  • 9.
    LINEAR INTERPOLATION SMOOTHING Another approach which combines trigram, bigram, and unigram models by linear interpolation  where c3 + c2 + c1= 1  The parameters ci can be fixed, or they can be trained with an EM algorithm.  It is possible to have values of ci that are dependent on the n-gram counts, so that  we place a higher weight on the probability estimates that are derived from higher counts.
  • 10.
    VITERBI EQUATION  Ittakes as input a unigram word probability distribution, P(word), and a string.  Then, for each position i in the string, it stores in best[i] the probability of the most probable string spanning from the start up to i.  It also stores in words[i] the word ending at position i that yielded the best probability.  Once it has built up the best and words arrays in a dynamic programming fashion, it then works backwards through words to find the best path.
  • 11.
    PROBABILISTIC CONTEXT-FREE GRAMMARS n-gram models take advantage of co-occurrence statistics in the corpora, but they have no notion of grammar at distances greater than n  An alternative language model is the PROBABILISTIC CONTEXT-FREE GRAMMAR (PCFG)  PCFG,' which consists of a CFG wherein each rewrite rule has an associated probability.  The sum of the probabilities across all rules with the same left-hand side is 1
  • 12.
    PROBABILISTIC CONTEXT-FREE GRAMMAR (PCFG)AND LEXICON Note: The numbers in square brackets indicate the probability that a left-hand-side symbol will be rewritten with the corresponding rule.
  • 13.
    PARSE TREE Probability ofa string, P(words), is just the sum of the probabilities of its parse trees. The probability of a given tree is the product of the probabilities of all the rules that make up the nodes of the tree.