SlideShare a Scribd company logo
1 of 20
by

 Mohd. Yaseen Ansari
   From TE CSE

under the guidance of

Prof. Mrs. A.R.Kulkarni
Introduction
Principle
Parts of Speech Classes
What is POS Tagging good for ?
Tag Set
Tag Set Example
Why is POS Tagging Hard ?
Methods for POS Tagging ?
Stochastic POS Tagging
Definition of Hidden Markov Model
HMM for Tagging
Viterbi Tagging
Viterbi Algorithm
An Example
Definition
            Parts of Speech Tagging is defined as the task
of labeling each word in a sentence with its appropriate
parts of speech

Example
          The mother kissed the baby on the cheek.

       The[AT] mother[NN] kissed[VBD] the[AT]
baby[NN] on[PRP] the[AT] cheek[NN].
The
mother
kissed      Noun
  the       Verb
 baby      Article
  on     Preposition
  the
cheek
Parts of speech tagging is harder than just having a list of
words and their parts of speech, because some words can
represent more than one part of speech at different
times, and because some parts of speech are complex or
unspoken. A large percentage of word-forms are
ambiguous. For example,

The sailor dogs the barmaid.
Even "dogs", which is usually thought of as just a plural
noun, can also be a verb.
There are two classes for parts of speech:-

1) Open Classes:- nouns , verbs , adjectives ,adverbs , etc.

2) Closed Classes:-

a) Conjunctions:- and , or , but , etc.
b) Pronouns:- I , she , him , etc.
c)Preposition:- with , on , under , etc.
d)Determiners:- the ,a ,an , etc.
e) Auxiliary verbs:- can , could , may , etc.

and there are many others.
1) Useful in -
       a) Information Retrieval
       b) Text to Speech
       c) Word Sense Disambiguation

2) Useful as a preprocessing step of parsing –
unique tag to each word reduces the number of parses.
For POS Tagging , there is need of tag sets so that one may
not have any problem for assigning one tag for each parts
of speech. There are four tag sets used worldwide.

1) Brown Corpus – 87 tag sets
2) Penn Tree Bank – 45 tag sets
3) British National Corpus – 61 tag sets
4) C7 – 164 tag sets

There are tag sets available which have tags for phrases
also.
PRP
PRP$
POS Tagging, most of the times is ambiguous that’s why one
can’t easily find the right tag for each word. For example, we
want to translate the ambiguous sentence.Example,

Time flies like an arrow.

Possibilities:-
1) Time/NN flies/NN like/VB an/AT arrow/NN.

2) Time/VB flies/NN like/IN an/AT arrow/NN.

3) Time/NN flies/VBZ like/IN an/AT arrow/NN.

Here the 3) is correct but see how many possibilities are there
and we don’t know exactly which one to choose. So one who has
a good hand in grammar and vocabulary can only make the
difference.
1) Rule-Based POS tagging
* e.g., ENGTWOL Tagger
* large collection (> 1000) of constraints on what
sequences of tags are allowable

2) Stochastic (Probabilistic) tagging
* e.g., HMM Tagger
* I’ll discuss this in a bit more detail

3) Transformation-based tagging
* e.g., Brill’s tagger
* Combination of Rule-Based and Stochastic
methodologies.
Input:- a string of words, tagset (ex. Book that flight, Penn
Treebank tagset)

Output:- a single best tag for each word (ex. Book/VB
that/DT flight/NN ./.)

Problem:- resolve ambiguity → disambiguation
Example-> book (Hand me that book, Book that flight)
Set of states – all possible tags
Output alphabet – all words in the language
State/tag transition probabilities
Initial state probabilities: the probability of beginning a
 sentence with a tag t (t0t)
Output probabilities – producing word w at state t
Output sequence – observed word sequence
State sequence – underlying tag sequence
First-order (bigram) Markov assumptions:

  1) Limited Horizon: Tag depends only on previous tag
       P(ti+1 = tk | t1=tj1,…,ti=tji) = P(ti+1 = tk | ti = tj)

  2) Time invariance: No change over time
       P(ti+1 = tk | ti = tj) = P(t2 = tk | t1 = tj) = P(tj  tk)

Output probabilities:

  1) Probability of getting word wk for tag tj: P(wk | tj)

  2) Assumption:


  Not dependent on other tags or words!
Probability of a tag sequence:

P(t1t2…tn) = P(t1)P(t1t2)P(t2t3)…P(tn-1tn)

Assume t0 – starting tag:
                = P(t0t1)P(t1t2)P(t2t3)…P(tn-1tn)

Probabilty of word sequence and tag sequence:

P(W,T) = i P(ti-1ti) P(wi | ti)
Labeled training = each word has a POS tag


Thus:
         PMLE(tj) = C(tj) / N
         PMLE(tjtk) = C(tj, tk) / C(tj)
         PMLE(wk | tj) = C(tj:wk) / C(tj)
1) D(0, START) = 0
2)   for each tag t != START do: D(1, t) = -
3)   for i  1 to N do:
        for each tag tj do:


D(i, tj)  maxk D(i-1,tk) + lm(tk tj) + lm(wi|tj)
Record best(i,j)=k which yielded the max

1) log P(W,T) = maxj D(N, tj)
2) Reconstruct path from maxj backwards


Where: lm(.) = log m(.) and D(i, tj) – max joint probability
of state and word sequences till position i, ending at tj.
Complexity: O(Nt2 N)
Most probable tag sequence given text:


      T*     = arg maxT Pm(T | W)
             = arg maxT Pm(W | T) Pm(T) / Pm(W)
                    (Bayes’ Theorem)
             = arg maxT Pm(W | T) Pm(T)
                    (W is constant for all T)
             = arg maxT i[m(ti-1ti) m(wi | ti) ]
             = arg maxT i log[m(ti-1ti) m(wi | ti) ]

Exponential number of possible tag sequences – use
dynamic programming for efficient computation
Secretariat/NNP is/VBZ expected/VBN to/TO race/VB
tomorrow/NN

People/NNS continue/VBP to/TO inquire/VB the DT
reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN

to/TO race/???

the/DT race/???
ti = argmaxj P(tj|ti-1)P(wi|tj)

max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)]

Brown:-
P(NN|TO) = .021            ×      P(race|NN) = .00041    = .000007
P(VB|TO) = .34 ×           P(race|VB) = .00003   = .00001
Parts of Speect Tagging

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lithium
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computationprasadmvreddy
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 

What's hot (20)

NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computation
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Word embedding
Word embedding Word embedding
Word embedding
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 

Similar to Parts of Speect Tagging

word level analysis
word level analysis word level analysis
word level analysis tjs1
 
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfLecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfNiraliRajeshAroraAut
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorDr. Cupid Lucid
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...cscpconf
 
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)Antonio Toral
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Guy De Pauw
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Jason Yang
 
2015ht13439 final presentation
2015ht13439 final presentation2015ht13439 final presentation
2015ht13439 final presentationAshutosh Kumar
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureRakuten Group, Inc.
 

Similar to Parts of Speect Tagging (20)

word level analysis
word level analysis word level analysis
word level analysis
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfLecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...
INFORMATIZED CAPTION ENHANCEMENT BASED ON IBM WATSON API AND SPEAKER PRONUNCI...
 
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Presentation 2
Presentation 2Presentation 2
Presentation 2
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10
 
2015ht13439 final presentation
2015ht13439 final presentation2015ht13439 final presentation
2015ht13439 final presentation
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
sadf
sadfsadf
sadf
 
NLP and Deep Learning
NLP and Deep LearningNLP and Deep Learning
NLP and Deep Learning
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Parts of Speect Tagging

  • 1. by Mohd. Yaseen Ansari From TE CSE under the guidance of Prof. Mrs. A.R.Kulkarni
  • 2. Introduction Principle Parts of Speech Classes What is POS Tagging good for ? Tag Set Tag Set Example Why is POS Tagging Hard ? Methods for POS Tagging ? Stochastic POS Tagging Definition of Hidden Markov Model HMM for Tagging Viterbi Tagging Viterbi Algorithm An Example
  • 3. Definition Parts of Speech Tagging is defined as the task of labeling each word in a sentence with its appropriate parts of speech Example The mother kissed the baby on the cheek. The[AT] mother[NN] kissed[VBD] the[AT] baby[NN] on[PRP] the[AT] cheek[NN].
  • 4. The mother kissed Noun the Verb baby Article on Preposition the cheek
  • 5. Parts of speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex or unspoken. A large percentage of word-forms are ambiguous. For example, The sailor dogs the barmaid. Even "dogs", which is usually thought of as just a plural noun, can also be a verb.
  • 6. There are two classes for parts of speech:- 1) Open Classes:- nouns , verbs , adjectives ,adverbs , etc. 2) Closed Classes:- a) Conjunctions:- and , or , but , etc. b) Pronouns:- I , she , him , etc. c)Preposition:- with , on , under , etc. d)Determiners:- the ,a ,an , etc. e) Auxiliary verbs:- can , could , may , etc. and there are many others.
  • 7. 1) Useful in - a) Information Retrieval b) Text to Speech c) Word Sense Disambiguation 2) Useful as a preprocessing step of parsing – unique tag to each word reduces the number of parses.
  • 8. For POS Tagging , there is need of tag sets so that one may not have any problem for assigning one tag for each parts of speech. There are four tag sets used worldwide. 1) Brown Corpus – 87 tag sets 2) Penn Tree Bank – 45 tag sets 3) British National Corpus – 61 tag sets 4) C7 – 164 tag sets There are tag sets available which have tags for phrases also.
  • 10. POS Tagging, most of the times is ambiguous that’s why one can’t easily find the right tag for each word. For example, we want to translate the ambiguous sentence.Example, Time flies like an arrow. Possibilities:- 1) Time/NN flies/NN like/VB an/AT arrow/NN. 2) Time/VB flies/NN like/IN an/AT arrow/NN. 3) Time/NN flies/VBZ like/IN an/AT arrow/NN. Here the 3) is correct but see how many possibilities are there and we don’t know exactly which one to choose. So one who has a good hand in grammar and vocabulary can only make the difference.
  • 11. 1) Rule-Based POS tagging * e.g., ENGTWOL Tagger * large collection (> 1000) of constraints on what sequences of tags are allowable 2) Stochastic (Probabilistic) tagging * e.g., HMM Tagger * I’ll discuss this in a bit more detail 3) Transformation-based tagging * e.g., Brill’s tagger * Combination of Rule-Based and Stochastic methodologies.
  • 12. Input:- a string of words, tagset (ex. Book that flight, Penn Treebank tagset) Output:- a single best tag for each word (ex. Book/VB that/DT flight/NN ./.) Problem:- resolve ambiguity → disambiguation Example-> book (Hand me that book, Book that flight)
  • 13. Set of states – all possible tags Output alphabet – all words in the language State/tag transition probabilities Initial state probabilities: the probability of beginning a sentence with a tag t (t0t) Output probabilities – producing word w at state t Output sequence – observed word sequence State sequence – underlying tag sequence
  • 14. First-order (bigram) Markov assumptions: 1) Limited Horizon: Tag depends only on previous tag P(ti+1 = tk | t1=tj1,…,ti=tji) = P(ti+1 = tk | ti = tj) 2) Time invariance: No change over time P(ti+1 = tk | ti = tj) = P(t2 = tk | t1 = tj) = P(tj  tk) Output probabilities: 1) Probability of getting word wk for tag tj: P(wk | tj) 2) Assumption: Not dependent on other tags or words!
  • 15. Probability of a tag sequence: P(t1t2…tn) = P(t1)P(t1t2)P(t2t3)…P(tn-1tn) Assume t0 – starting tag: = P(t0t1)P(t1t2)P(t2t3)…P(tn-1tn) Probabilty of word sequence and tag sequence: P(W,T) = i P(ti-1ti) P(wi | ti)
  • 16. Labeled training = each word has a POS tag Thus: PMLE(tj) = C(tj) / N PMLE(tjtk) = C(tj, tk) / C(tj) PMLE(wk | tj) = C(tj:wk) / C(tj)
  • 17. 1) D(0, START) = 0 2) for each tag t != START do: D(1, t) = - 3) for i  1 to N do: for each tag tj do: D(i, tj)  maxk D(i-1,tk) + lm(tk tj) + lm(wi|tj) Record best(i,j)=k which yielded the max 1) log P(W,T) = maxj D(N, tj) 2) Reconstruct path from maxj backwards Where: lm(.) = log m(.) and D(i, tj) – max joint probability of state and word sequences till position i, ending at tj. Complexity: O(Nt2 N)
  • 18. Most probable tag sequence given text: T* = arg maxT Pm(T | W) = arg maxT Pm(W | T) Pm(T) / Pm(W) (Bayes’ Theorem) = arg maxT Pm(W | T) Pm(T) (W is constant for all T) = arg maxT i[m(ti-1ti) m(wi | ti) ] = arg maxT i log[m(ti-1ti) m(wi | ti) ] Exponential number of possible tag sequences – use dynamic programming for efficient computation
  • 19. Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN to/TO race/??? the/DT race/??? ti = argmaxj P(tj|ti-1)P(wi|tj) max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)] Brown:- P(NN|TO) = .021 × P(race|NN) = .00041 = .000007 P(VB|TO) = .34 × P(race|VB) = .00003 = .00001