by Mohd. Yaseen Ansari   From TE CSEunder the guidance ofProf. Mrs. A.R.Kulkarni
IntroductionPrincipleParts of Speech ClassesWhat is POS Tagging good for ?Tag SetTag Set ExampleWhy is POS Tagging ...
Definition            Parts of Speech Tagging is defined as the taskof labeling each word in a sentence with its appropria...
Themotherkissed      Noun  the       Verb baby      Article  on     Preposition  thecheek
Parts of speech tagging is harder than just having a list ofwords and their parts of speech, because some words canreprese...
There are two classes for parts of speech:-1) Open Classes:- nouns , verbs , adjectives ,adverbs , etc.2) Closed Classes:-...
1) Useful in -       a) Information Retrieval       b) Text to Speech       c) Word Sense Disambiguation2) Useful as a pre...
For POS Tagging , there is need of tag sets so that one maynot have any problem for assigning one tag for each partsof spe...
PRPPRP$
POS Tagging, most of the times is ambiguous that’s why onecan’t easily find the right tag for each word. For example, wewa...
1) Rule-Based POS tagging* e.g., ENGTWOL Tagger* large collection (> 1000) of constraints on whatsequences of tags are all...
Input:- a string of words, tagset (ex. Book that flight, PennTreebank tagset)Output:- a single best tag for each word (ex....
Set of states – all possible tagsOutput alphabet – all words in the languageState/tag transition probabilitiesInitial ...
First-order (bigram) Markov assumptions:  1) Limited Horizon: Tag depends only on previous tag       P(ti+1 = tk | t1=tj1...
Probability of a tag sequence:P(t1t2…tn) = P(t1)P(t1t2)P(t2t3)…P(tn-1tn)Assume t0 – starting tag:                = P(t0...
Labeled training = each word has a POS tagThus:         PMLE(tj) = C(tj) / N         PMLE(tjtk) = C(tj, tk) / C(tj)    ...
1) D(0, START) = 02)   for each tag t != START do: D(1, t) = -3)   for i  1 to N do:        for each tag tj do:D(i, tj) ...
Most probable tag sequence given text:      T*     = arg maxT Pm(T | W)             = arg maxT Pm(W | T) Pm(T) / Pm(W)   ...
Secretariat/NNP is/VBZ expected/VBN to/TO race/VBtomorrow/NNPeople/NNS continue/VBP to/TO inquire/VB the DTreason/NN for...
Parts of Speect Tagging
Upcoming SlideShare
Loading in …5
×

Parts of Speect Tagging

1,125 views

Published on

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,125
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
100
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Parts of Speect Tagging

  1. 1. by Mohd. Yaseen Ansari From TE CSEunder the guidance ofProf. Mrs. A.R.Kulkarni
  2. 2. IntroductionPrincipleParts of Speech ClassesWhat is POS Tagging good for ?Tag SetTag Set ExampleWhy is POS Tagging Hard ?Methods for POS Tagging ?Stochastic POS TaggingDefinition of Hidden Markov ModelHMM for TaggingViterbi TaggingViterbi AlgorithmAn Example
  3. 3. Definition Parts of Speech Tagging is defined as the taskof labeling each word in a sentence with its appropriateparts of speechExample The mother kissed the baby on the cheek. The[AT] mother[NN] kissed[VBD] the[AT]baby[NN] on[PRP] the[AT] cheek[NN].
  4. 4. Themotherkissed Noun the Verb baby Article on Preposition thecheek
  5. 5. Parts of speech tagging is harder than just having a list ofwords and their parts of speech, because some words canrepresent more than one part of speech at differenttimes, and because some parts of speech are complex orunspoken. A large percentage of word-forms areambiguous. For example,The sailor dogs the barmaid.Even "dogs", which is usually thought of as just a pluralnoun, can also be a verb.
  6. 6. There are two classes for parts of speech:-1) Open Classes:- nouns , verbs , adjectives ,adverbs , etc.2) Closed Classes:-a) Conjunctions:- and , or , but , etc.b) Pronouns:- I , she , him , etc.c)Preposition:- with , on , under , etc.d)Determiners:- the ,a ,an , etc.e) Auxiliary verbs:- can , could , may , etc.and there are many others.
  7. 7. 1) Useful in - a) Information Retrieval b) Text to Speech c) Word Sense Disambiguation2) Useful as a preprocessing step of parsing –unique tag to each word reduces the number of parses.
  8. 8. For POS Tagging , there is need of tag sets so that one maynot have any problem for assigning one tag for each partsof speech. There are four tag sets used worldwide.1) Brown Corpus – 87 tag sets2) Penn Tree Bank – 45 tag sets3) British National Corpus – 61 tag sets4) C7 – 164 tag setsThere are tag sets available which have tags for phrasesalso.
  9. 9. PRPPRP$
  10. 10. POS Tagging, most of the times is ambiguous that’s why onecan’t easily find the right tag for each word. For example, wewant to translate the ambiguous sentence.Example,Time flies like an arrow.Possibilities:-1) Time/NN flies/NN like/VB an/AT arrow/NN.2) Time/VB flies/NN like/IN an/AT arrow/NN.3) Time/NN flies/VBZ like/IN an/AT arrow/NN.Here the 3) is correct but see how many possibilities are thereand we don’t know exactly which one to choose. So one who hasa good hand in grammar and vocabulary can only make thedifference.
  11. 11. 1) Rule-Based POS tagging* e.g., ENGTWOL Tagger* large collection (> 1000) of constraints on whatsequences of tags are allowable2) Stochastic (Probabilistic) tagging* e.g., HMM Tagger* I’ll discuss this in a bit more detail3) Transformation-based tagging* e.g., Brill’s tagger* Combination of Rule-Based and Stochasticmethodologies.
  12. 12. Input:- a string of words, tagset (ex. Book that flight, PennTreebank tagset)Output:- a single best tag for each word (ex. Book/VBthat/DT flight/NN ./.)Problem:- resolve ambiguity → disambiguationExample-> book (Hand me that book, Book that flight)
  13. 13. Set of states – all possible tagsOutput alphabet – all words in the languageState/tag transition probabilitiesInitial state probabilities: the probability of beginning a sentence with a tag t (t0t)Output probabilities – producing word w at state tOutput sequence – observed word sequenceState sequence – underlying tag sequence
  14. 14. First-order (bigram) Markov assumptions: 1) Limited Horizon: Tag depends only on previous tag P(ti+1 = tk | t1=tj1,…,ti=tji) = P(ti+1 = tk | ti = tj) 2) Time invariance: No change over time P(ti+1 = tk | ti = tj) = P(t2 = tk | t1 = tj) = P(tj  tk)Output probabilities: 1) Probability of getting word wk for tag tj: P(wk | tj) 2) Assumption: Not dependent on other tags or words!
  15. 15. Probability of a tag sequence:P(t1t2…tn) = P(t1)P(t1t2)P(t2t3)…P(tn-1tn)Assume t0 – starting tag: = P(t0t1)P(t1t2)P(t2t3)…P(tn-1tn)Probabilty of word sequence and tag sequence:P(W,T) = i P(ti-1ti) P(wi | ti)
  16. 16. Labeled training = each word has a POS tagThus: PMLE(tj) = C(tj) / N PMLE(tjtk) = C(tj, tk) / C(tj) PMLE(wk | tj) = C(tj:wk) / C(tj)
  17. 17. 1) D(0, START) = 02) for each tag t != START do: D(1, t) = -3) for i  1 to N do: for each tag tj do:D(i, tj)  maxk D(i-1,tk) + lm(tk tj) + lm(wi|tj)Record best(i,j)=k which yielded the max1) log P(W,T) = maxj D(N, tj)2) Reconstruct path from maxj backwardsWhere: lm(.) = log m(.) and D(i, tj) – max joint probabilityof state and word sequences till position i, ending at tj.Complexity: O(Nt2 N)
  18. 18. Most probable tag sequence given text: T* = arg maxT Pm(T | W) = arg maxT Pm(W | T) Pm(T) / Pm(W) (Bayes’ Theorem) = arg maxT Pm(W | T) Pm(T) (W is constant for all T) = arg maxT i[m(ti-1ti) m(wi | ti) ] = arg maxT i log[m(ti-1ti) m(wi | ti) ]Exponential number of possible tag sequences – usedynamic programming for efficient computation
  19. 19. Secretariat/NNP is/VBZ expected/VBN to/TO race/VBtomorrow/NNPeople/NNS continue/VBP to/TO inquire/VB the DTreason/NN for/IN the/DT race/NN for/IN outer/JJ space/NNto/TO race/???the/DT race/???ti = argmaxj P(tj|ti-1)P(wi|tj)max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)]Brown:-P(NN|TO) = .021 × P(race|NN) = .00041 = .000007P(VB|TO) = .34 × P(race|VB) = .00003 = .00001

×