SlideShare a Scribd company logo
1 of 13
UNIT 5 AI APPLICATIONS
SYLLABUS
AI applications
 Language Models
 Probabilistic Language Models
 Information Retrieval
 Information Extraction
 Natural Language Processing
 Machine Translation
 Speech Recognition
 Robot
 Hardware – Perception
 Planning – Moving
PROBABILISTIC LANGUAGE
PROCESSING
 corpus-based approach to understand the
language
 A corpus (plural corpora) is a large collection of
text, such as the billions of pages that make up the
World Wide Web.
 The text is written by and for humans,
 The task of the software is to make it easier for the
human to find the right information.
 This approach implies the use of statistics
 Learning to take advantage of the corpus,
 Probabilistic language models that can be learned
from data
PROBABILISTIC
LANGUAGE
MODELS
 learning is just a matter of counting occurrences
 probability can be used to choose the most likely
interpretation
 A probabilistic language model defines a
probability distribution over a (possibly infinite)
set of strings.
 Eg : bigram and trigram language models used in
speech recognition
MODELS
 unigram model assigns a probability P(w) to each
word in the lexicon.
 The model assumes that words are chosen
independently,
 so the probability of a string is just the product of
the probability of its words, given by
Πi P(wi)
 A bigram model assigns a probability
Πi P(wi/wi-1)
to each word, given the previous word
N-GRAM MODEL
 an n-gram model conditions on the previous n - 1
words, assigning a probability
Πi P(wi/wi-(n-1) … wi-1)
 The models themselves agree:
 the model assigns its random string a probability of
 For trigram is10-10 ,
 For bigram is10-29 and
 For unigram is 10-59
SMOOTHING
 pairs will have a count of zero
 We need some way of smoothing over the zero
counts.
 ADD-ONE SMOOTHING :
 The simplest way to do this is called add-one smoothing
 add one to the count of every possible bigram
 So if there are N words in the corpus and. B possible
bigrams, then each
 bigram with an actual count of c is assigned a probability
estimate of
(c + l)/(N + B)
 This method eliminates the problem of zero-probability
n-grams,
 but the assumption that every count should be
incremented by exactly one
LINEAR INTERPOLATION SMOOTHING
 Another approach which combines trigram,
bigram, and unigram models by linear interpolation
 where c3 + c2 + c1= 1
 The parameters ci can be fixed, or they can be
trained with an EM algorithm.
 It is possible to have values of ci that are
dependent on the n-gram counts, so that
 we place a higher weight on the probability
estimates that are derived from higher counts.
VITERBI EQUATION
 It takes as input a unigram word probability
distribution, P(word), and a string.
 Then, for each position i in the string, it stores in
best[i] the probability of the most probable string
spanning from the start up to i.
 It also stores in words[i] the word ending at
position i that yielded the best probability.
 Once it has built up the best and words arrays in a
dynamic programming fashion, it then works
backwards through words to find the best path.
PROBABILISTIC CONTEXT-FREE GRAMMARS
 n-gram models take advantage of co-occurrence
statistics in the corpora, but they have no notion of
grammar at distances greater than n
 An alternative language model is the
PROBABILISTIC CONTEXT-FREE GRAMMAR
(PCFG)
 PCFG,' which consists of a CFG wherein each
rewrite rule has an associated probability.
 The sum of the probabilities across all rules with the
same left-hand side is 1
PROBABILISTIC CONTEXT-FREE GRAMMAR
(PCFG) AND LEXICON
Note:
The numbers in
square brackets
indicate the
probability that a
left-hand-side
symbol will be
rewritten with the
corresponding rule.
PARSE TREE
Probability of a string, P(words), is just the sum of the probabilities of its
parse trees.
The probability of a given tree is the product of the probabilities of all the
rules that make up the nodes of the tree.

More Related Content

What's hot

Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman ProblemDaniel Raditya
 
Mca 4040 analysis and design of algorithm
Mca 4040  analysis and design of algorithmMca 4040  analysis and design of algorithm
Mca 4040 analysis and design of algorithmsmumbahelp
 
Mca 4040 analysis and design of algorithm
Mca 4040  analysis and design of algorithmMca 4040  analysis and design of algorithm
Mca 4040 analysis and design of algorithmsmumbahelp
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
 
An Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmAn Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmIDES Editor
 
Computational Complexity for Poets
Computational Complexity for PoetsComputational Complexity for Poets
Computational Complexity for PoetsAleksandar Bradic
 
Good Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceGood Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceRobert Short
 
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...devsjee
 

What's hot (10)

Travelling Salesman Problem
Travelling Salesman ProblemTravelling Salesman Problem
Travelling Salesman Problem
 
Mca 4040 analysis and design of algorithm
Mca 4040  analysis and design of algorithmMca 4040  analysis and design of algorithm
Mca 4040 analysis and design of algorithm
 
Mca 4040 analysis and design of algorithm
Mca 4040  analysis and design of algorithmMca 4040  analysis and design of algorithm
Mca 4040 analysis and design of algorithm
 
September 11, Deliberative Algorithms II
September 11, Deliberative Algorithms IISeptember 11, Deliberative Algorithms II
September 11, Deliberative Algorithms II
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
 
An Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching AlgorithmAn Index Based K-Partitions Multiple Pattern Matching Algorithm
An Index Based K-Partitions Multiple Pattern Matching Algorithm
 
Computational Complexity for Poets
Computational Complexity for PoetsComputational Complexity for Poets
Computational Complexity for Poets
 
Good Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceGood Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial Intelligence
 
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...
An Empirical Study of Knowledge Tradeoffs in Case-Based Reasoning - IJCAI-ECA...
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 

Similar to Artificial Intelligence

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
GRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMGRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMijcseit
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATIONNEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATIONcscpconf
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 

Similar to Artificial Intelligence (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Language models
Language modelsLanguage models
Language models
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Grammar Based Pre-Processing for PPM
Grammar Based Pre-Processing for PPM Grammar Based Pre-Processing for PPM
Grammar Based Pre-Processing for PPM
 
GRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMGRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPM
 
Grammar Based Pre-Processing for PPM
Grammar Based Pre-Processing for PPMGrammar Based Pre-Processing for PPM
Grammar Based Pre-Processing for PPM
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATIONNEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION
NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
Daa unit 5
Daa unit 5Daa unit 5
Daa unit 5
 
semeval2016
semeval2016semeval2016
semeval2016
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

Artificial Intelligence

  • 1. UNIT 5 AI APPLICATIONS
  • 2. SYLLABUS AI applications  Language Models  Probabilistic Language Models  Information Retrieval  Information Extraction  Natural Language Processing  Machine Translation  Speech Recognition  Robot  Hardware – Perception  Planning – Moving
  • 3. PROBABILISTIC LANGUAGE PROCESSING  corpus-based approach to understand the language  A corpus (plural corpora) is a large collection of text, such as the billions of pages that make up the World Wide Web.  The text is written by and for humans,  The task of the software is to make it easier for the human to find the right information.  This approach implies the use of statistics  Learning to take advantage of the corpus,  Probabilistic language models that can be learned from data
  • 5.  learning is just a matter of counting occurrences  probability can be used to choose the most likely interpretation  A probabilistic language model defines a probability distribution over a (possibly infinite) set of strings.  Eg : bigram and trigram language models used in speech recognition
  • 6. MODELS  unigram model assigns a probability P(w) to each word in the lexicon.  The model assumes that words are chosen independently,  so the probability of a string is just the product of the probability of its words, given by Πi P(wi)  A bigram model assigns a probability Πi P(wi/wi-1) to each word, given the previous word
  • 7. N-GRAM MODEL  an n-gram model conditions on the previous n - 1 words, assigning a probability Πi P(wi/wi-(n-1) … wi-1)  The models themselves agree:  the model assigns its random string a probability of  For trigram is10-10 ,  For bigram is10-29 and  For unigram is 10-59
  • 8. SMOOTHING  pairs will have a count of zero  We need some way of smoothing over the zero counts.  ADD-ONE SMOOTHING :  The simplest way to do this is called add-one smoothing  add one to the count of every possible bigram  So if there are N words in the corpus and. B possible bigrams, then each  bigram with an actual count of c is assigned a probability estimate of (c + l)/(N + B)  This method eliminates the problem of zero-probability n-grams,  but the assumption that every count should be incremented by exactly one
  • 9. LINEAR INTERPOLATION SMOOTHING  Another approach which combines trigram, bigram, and unigram models by linear interpolation  where c3 + c2 + c1= 1  The parameters ci can be fixed, or they can be trained with an EM algorithm.  It is possible to have values of ci that are dependent on the n-gram counts, so that  we place a higher weight on the probability estimates that are derived from higher counts.
  • 10. VITERBI EQUATION  It takes as input a unigram word probability distribution, P(word), and a string.  Then, for each position i in the string, it stores in best[i] the probability of the most probable string spanning from the start up to i.  It also stores in words[i] the word ending at position i that yielded the best probability.  Once it has built up the best and words arrays in a dynamic programming fashion, it then works backwards through words to find the best path.
  • 11. PROBABILISTIC CONTEXT-FREE GRAMMARS  n-gram models take advantage of co-occurrence statistics in the corpora, but they have no notion of grammar at distances greater than n  An alternative language model is the PROBABILISTIC CONTEXT-FREE GRAMMAR (PCFG)  PCFG,' which consists of a CFG wherein each rewrite rule has an associated probability.  The sum of the probabilities across all rules with the same left-hand side is 1
  • 12. PROBABILISTIC CONTEXT-FREE GRAMMAR (PCFG) AND LEXICON Note: The numbers in square brackets indicate the probability that a left-hand-side symbol will be rewritten with the corresponding rule.
  • 13. PARSE TREE Probability of a string, P(words), is just the sum of the probabilities of its parse trees. The probability of a given tree is the product of the probabilities of all the rules that make up the nodes of the tree.