Successfully reported this slideshow.

# Lecture-18(11-02-22)Stochastics POS Tagging.pdf

Upcoming SlideShare
Parts of Speect Tagging
×

# Lecture-18(11-02-22)Stochastics POS Tagging.pdf

POS tagging in NLP

POS tagging in NLP

## More Related Content

### Lecture-18(11-02-22)Stochastics POS Tagging.pdf

1. 1. DLO8012: Natural Language Processing Subject Teacher: Prof. Vikas Dubey RIZVI COLLEGE OF ENGINEERING BANDRA(W),MUMBAI 1
2. 2. Module-3 Syntax Analysis CO-3 [10hrs] CO-3: Be able to model linguistic phenomena with formal grammars. 2
3. 3. 3 Conditional Probability and Tags • P(Verb) is the probability of a randomly selected word being a verb. • P(Verb|race) is “what’s the probability of a word being a verb given that it’s the word “race”? • Race can be a noun or a verb. • It’s more likely to be a noun. • P(Verb|race) can be estimated by looking at some corpus and saying “out of all the times we saw ‘race’, how many were verbs? • In Brown corpus, P(Verb|race) = 96/98 = .98 • How to calculate for a tag sequence, say P(NN|DT)?  P(V | race) = Count(race is verb) total Count(race) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
4. 4. Stochastic Tagging • Stochastic taggers generally resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context. • Stochastic tagger called also HMM tagger or a Maximum Likelihood Tagger, or a Markov model HMM TAGGER tagger, based on the Hidden Markov Model. • For a given word sequence, Hidden Markov Model (HMM) Taggers choose the tag sequence that maximizes, P(word | tag) * P(tag | previous-n-tags) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 4
5. 5. Stochastic Tagging • A bigram HMM tagger chooses the tag ti for word wi that is most probable given the previous tag, ti-1 ti = argmaxj P(tj | ti-1, wi) • From the chain rule for probability factorization, • Some approximation are introduced to simplify the model, such as Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 5
6. 6. Stochastic Tagging • The word probability depends only on the tag • The dependence of a tag from the preceding tag history is limited in time, e.i. a tag depends only on the two preceding ones, Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 6
7. 7. 7 Statistical POS Tagging (Allen95) • Let’s step back a minute and remember some probability theory and its use in POS tagging. • Suppose, with no context, we just want to know given the word “flies” whether it should be tagged as a noun or as a verb. • We use conditional probability for this: we want to know which is greater PROB(N | flies) or PROB(V | flies) • Note definition of conditional probability PROB(a | b) = PROB(a & b) / PROB(b) – Where PROB(a & b) is the probability of the two events a & b occurring simultaneously Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
8. 8. 8 Calculating POS for “flies” We need to know which is more • PROB(N | flies) = PROB(flies & N) / PROB(flies) • PROB(V | flies) = PROB(flies & V) / PROB(flies) • Count on a Corpus Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
9. 9. Stochastic Tagging • The simplest stochastic tagger applies the following approaches for POS tagging – Approach 1: Word Frequency Approach • In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs with a particular tag. • We can also say that the tag encountered most frequently with the word in the training set is the one assigned to an ambiguous instance of that word. • The main issue with this approach is that it may yield inadmissible sequence of tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 9
10. 10. Stochastic Tagging • Assign each word its most likely POS tag – If w has tags t1, …, tk, then can use – P(ti | w) = c(w,ti )/(c(w,t1) + … + c(w,tk)), where – c(w, ti ) = number of times w/ti appears in the corpus – Success: 91% for English Example heat :: noun/89, verb/5 Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 10
11. 11. Stochastic Tagging Approach 2: Tag Sequence Probabilities • It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of tags occurring. • It is also called n-gram approach. • It is called so because the best tag for a given word is determined by the probability at which it occurs with the n previous tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 11
12. 12. Stochastic Tagging • Given: sequence of words W – W = w1,w2,…,wn (a sentence) • – e.g., W = heat water in a large vessel • Assign sequence of tags T: • T = t1, t2, … , tn • Find T that maximizes P(T | W) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 12
13. 13. Stochastic Tagging • But P(ti|wi) is difficult to compute and Bayesian classification rule is used: P(x|y) = P(x) P(y|x) / P(y) • When applied to the sequence of words, the most probable tag sequence would be P(ti|wi) = P(ti) P(wi|ti)/P(wi) • where P(wi) does not change and thus do not need to be calculated • Thus, the most probable tag sequence is the product of two probabilities for each possible sequence: – Prior probability of the tag sequence. Context P(ti) – Likelihood of the sequence of words considering a sequence of (hidden) tags. P(wi|ti) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 13
14. 14. Stochastic Tagging • Two simplifications for computing the most probable sequence of tags: – Prior probability of the part of speech tag of a word depends only on the tag of the previous word (bigrams, reduce context to previous). Facilitates the computation of P(ti) – Ex. Probability of noun after determiner – Probability of a word depends only on its part-of-speech tag. (independent of other words in the context). Facilitates the computation of P(wi|ti), Likelihood probability. • Ex. given the tag noun, probability of word dog Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 14
15. 15. 15 Stochastic Tagging • Based on probability of certain tag occurring given various possibilities • Necessitates a training corpus • No probabilities for words not in corpus. • Training corpus may be too different from test corpus. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
16. 16. 16 Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! – Result: 90% accuracy – Why? – Baseline: Others will do better – HMM is an example Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
17. 17. Thank You… 17