SlideShare a Scribd company logo
1 of 31
Download to read offline
Issues in POS tagging
Thennarasu Sakkan
Department of Linguistics
Central University of Kerala
2
Usually one part-of-speech per word.
Resolving lexical ambiguity.
For Example,
avaḷPR_PRP cantaiyilN_NN kattiN_NN
viṟṟāḷV_VM_VF .RD_PUNC
3
For Example,
paccaiJJ miḷakāyilN_NN namakkuP_PRP
teriyātaV_VM_VNF palaJJ uṭalN_NN
nalaJJ/N_NN payaṉkaḷN_NN
aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC
intaDM_DMD paccaiN_NN mikavumRP_INTF
meṇmaiyākaRB irukkiṟatuV_VM_VF
.RD_PUNC
4
One of the main reasons for incorporating a
tagging is to reduce ambiguities.
Fruit flies like a Banana.
Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP
./.
Fruit/NNP Flies/NNP like/VBP a/DT
Banana/NNP ./.
We need to normalization the corpus which makes
the tagging process very complex.
For example : malaiyaṭivārattilN_NN
kōviloṉṟuṇṭuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
‘paṭi’ (படி) in Tamil: A Corpus-based study shows:
a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC
(Noun)
b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF
.RD_PUNC (Verb)
c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD
இதுPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF
.RD_PUNC (Adverb)
e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP
வனதுN_NN 30QT_QTC. (Particle)
f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO
வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC
(Verbal participle)
g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging?
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging
uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF
ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP
avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN
iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF
.RD_PUNC
eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP
ceṉṟārV_VM_VF .RD_PUNC
ippattirikaiN_NN kaṭciyaipN_NN
paravapV_VM_VINF perumJJ
toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN
pirittārV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC
avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN
iruntārV_VM_VF .RD_PUNC
avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC
avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF
vāṅkiṉāṉV_VM_VF .RD_PUNC
aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP
intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN
eḻutiṉārV_VM_VF .RD_PUNC
itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN
nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP
pōṉōmV_VM_VF .RD_PUNC
eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP
nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ
tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP
karutiyiruntārV_VM_VF .RD_PUNC
vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV
,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP
aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC
malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP
eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC
kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN
eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
(Accept, admit)
(eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP
pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP
sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC
aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN
ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS
ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
Proper nouns
nēruN_NNP,
makātmā kāntiN_NNP,
intirākāntiN_NNP,
ampētkārN_NNP
1003-mQT_QTO āṇṭilN_NN irājaN_NNP
rājaN_NNP cōḻaṉN_NNP periyaN_NNP
kōyilaikN_NNP kaṭṭatV_VM_VINF
toṭaṅkiṉāṉV_VM_VF .RD_PUNC
Another tokenization issues concerns with compound words
and multi word expression
māṭṭu vaṇṭiN_NN
kāli ḵpiḷavarN_NN
muṭṭaikkōsN_NN
muṭṭaik N_NN kōsN_NN
vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN
Title of the books, name of the movies S. Ramakrishnan's
Short stories
naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN
appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF
.RD_PUNC
mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP
ceḷantirrājaṉN_NNP
Cinema
nītāṉēPR_PRP eṉPR_PRP poṉJJ
vacantamN_NN
naṭuvulaN_NST koñcamQT_QTF
pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC
nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC
eppaṭi maṉacukkuḷ vantāyV_VM_VF
.RD_PUNC
Abbreviation
pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr.,
Mrs.,
Word ambiguity
āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF,
kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF
paccaiJJ/N_NN
paccaik kāykaṟiN_NN
paccaip poyN_NN
paccai uṭampuN_NN
Ambiguity at structural level
nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN
pārttēṉV_VM_VF .RD_PUNC
nāṉum kumāriyum kumāraip pārttōmā
nāṉ kumār kumāriyōṭu iruppataip pārttēṉ
avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF
.RD_PUNC
In case of a POS tagger, the major issues that need
to be dealt with are:
1. Fineness v/s Coarseness in linguistic analysis
2. Syntactic Function v/s lexical category
3. New tags v/s tags from a standard tagger
Content
Eby's argument on POS
Extracting rules from tagged data
Mubeena's query on VBZ
Multiple Choice Questions
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF
.RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN
,RD_PUNC malaiyāḷaN_NNP
eḻuttāḷarumāṉaRD_UNK pālN_NNP
cakkariyāN_NNP eṉṉaicPR_PRP
cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
Eby's argument on POS
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Extracting rules from tagged data
The algorithm for the lexical item ‘paṭi’ can be described as below:
if {# word contains plu. & cas.mar assign tag as noun
}
elsif {# word takes ten.mar then tag as verb
}
elsif {# word contains ten.mar + -a/-um then tag as relative participle
}
elsif {# paṭi comes after a noun then tag as particle
}
elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb
then tag as adverb
}
elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb
then tag as verbal participle
}
else {# do the 'else'
}
Mubeena's query on VBZ
Multiple Choice Questions
1. Who said that Computational linguistics as the study of
computer systems for understanding and generating
natural language?
2. ________ is a simple yet powerful programming
language with excellent functionality for processing
linguistic data.
3. What is NLTK stands for_________?
4. Name any of the scripting language?
5. What does mean by this command 'tr 'a-z' 'A-Z' <
inputfile > outputfile' in Linux?
7. What does it mean by tr ’aiou’ e < inputfile >
outputfile
8. What does it mean by tr -c 'A-Za-z' '012' <inputfile>
outputfile
9. What are the salient features of Corpus?
10. _________ means to divide into parts and describe
the relations among the parts.
11. The word parser typically is restricted to the _______
level analyzer.
12. Shallow parsing is also known as __________ parsing.
13. Parsed corpora are sometimes known as ________.
14. What is the simplest n-gram model is called ______
model?
15. What are the six requirements kept in mind while designing
NLTK?
16. Could you mention the Microsoft keys functions for
consonants in your language?
17. What are the relevance of annotated corpus?
18. Who has argued first for the relevance of shallow parsing?
19. Describe the tags of Chunking with example for each.
8 issues in pos tagging

More Related Content

What's hot

ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introductionwahab khan
 
Predicate Logic
Predicate LogicPredicate Logic
Predicate Logicgiki67
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardAnimesh Chaturvedi
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prologbaran19901990
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithmsRajendran
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present) Melody Joey
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prologsaru40
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog LanguageREHMAT ULLAH
 

What's hot (20)

ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introduction
 
Predicate Logic
Predicate LogicPredicate Logic
Predicate Logic
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-Hard
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prolog
 
Prolog
PrologProlog
Prolog
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Recurrences
RecurrencesRecurrences
Recurrences
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Np hard
Np hardNp hard
Np hard
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
PROLOG: Introduction To Prolog
PROLOG: Introduction To PrologPROLOG: Introduction To Prolog
PROLOG: Introduction To Prolog
 
AI-09 Logic in AI
AI-09 Logic in AIAI-09 Logic in AI
AI-09 Logic in AI
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
 

More from ThennarasuSakkan

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introductionThennarasuSakkan
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpusThennarasuSakkan
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpusThennarasuSakkan
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpusThennarasuSakkan
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introductionThennarasuSakkan
 

More from ThennarasuSakkan (9)

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
 
2 why python for nlp
2 why python for nlp2 why python for nlp
2 why python for nlp
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 

Recently uploaded

Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 

8 issues in pos tagging

  • 1. Issues in POS tagging Thennarasu Sakkan Department of Linguistics Central University of Kerala
  • 2. 2 Usually one part-of-speech per word. Resolving lexical ambiguity. For Example, avaḷPR_PRP cantaiyilN_NN kattiN_NN viṟṟāḷV_VM_VF .RD_PUNC
  • 3. 3 For Example, paccaiJJ miḷakāyilN_NN namakkuP_PRP teriyātaV_VM_VNF palaJJ uṭalN_NN nalaJJ/N_NN payaṉkaḷN_NN aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC intaDM_DMD paccaiN_NN mikavumRP_INTF meṇmaiyākaRB irukkiṟatuV_VM_VF .RD_PUNC
  • 4. 4 One of the main reasons for incorporating a tagging is to reduce ambiguities. Fruit flies like a Banana. Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP ./. Fruit/NNP Flies/NNP like/VBP a/DT Banana/NNP ./.
  • 5. We need to normalization the corpus which makes the tagging process very complex. For example : malaiyaṭivārattilN_NN kōviloṉṟuṇṭuV_VM_VF .RD_PUNC Issues in Tamil POS Tagging?
  • 6. ‘paṭi’ (படி) in Tamil: A Corpus-based study shows: a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC (Noun) b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF .RD_PUNC (Verb) c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD இதுPR_PRP .RD_PUNC (Relative Participle or Adjective) d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF .RD_PUNC (Adverb) e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP வனதுN_NN 30QT_QTC. (Particle) f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC (Verbal participle) g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF .RD_PUNC (Infinitive Verb) Issues in Tamil POS Tagging?
  • 7. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb)
  • 8. Issues in Tamil POS Tagging uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF .RD_PUNC eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP ceṉṟārV_VM_VF .RD_PUNC ippattirikaiN_NN kaṭciyaipN_NN paravapV_VM_VINF perumJJ toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
  • 9. appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN pirittārV_VM_VF .RD_PUNC Issues in Tamil POS Tagging? kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN iruntārV_VM_VF .RD_PUNC avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF vāṅkiṉāṉV_VM_VF .RD_PUNC aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
  • 10. Issues in Tamil POS Tagging? avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN eḻutiṉārV_VM_VF .RD_PUNC itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP pōṉōmV_VM_VF .RD_PUNC eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
  • 11. Issues in Tamil POS Tagging? imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP karutiyiruntārV_VM_VF .RD_PUNC vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV ,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
  • 12. nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC (Accept, admit) (eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
  • 13. nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
  • 14. Proper nouns nēruN_NNP, makātmā kāntiN_NNP, intirākāntiN_NNP, ampētkārN_NNP 1003-mQT_QTO āṇṭilN_NN irājaN_NNP rājaN_NNP cōḻaṉN_NNP periyaN_NNP kōyilaikN_NNP kaṭṭatV_VM_VINF toṭaṅkiṉāṉV_VM_VF .RD_PUNC
  • 15. Another tokenization issues concerns with compound words and multi word expression māṭṭu vaṇṭiN_NN kāli ḵpiḷavarN_NN muṭṭaikkōsN_NN muṭṭaik N_NN kōsN_NN vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN Title of the books, name of the movies S. Ramakrishnan's Short stories naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF .RD_PUNC mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP ceḷantirrājaṉN_NNP
  • 16. Cinema nītāṉēPR_PRP eṉPR_PRP poṉJJ vacantamN_NN naṭuvulaN_NST koñcamQT_QTF pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC eppaṭi maṉacukkuḷ vantāyV_VM_VF .RD_PUNC
  • 17. Abbreviation pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr., Mrs., Word ambiguity āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF, kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF paccaiJJ/N_NN paccaik kāykaṟiN_NN paccaip poyN_NN paccai uṭampuN_NN
  • 18. Ambiguity at structural level nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN pārttēṉV_VM_VF .RD_PUNC nāṉum kumāriyum kumāraip pārttōmā nāṉ kumār kumāriyōṭu iruppataip pārttēṉ avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF .RD_PUNC
  • 19. In case of a POS tagger, the major issues that need to be dealt with are: 1. Fineness v/s Coarseness in linguistic analysis 2. Syntactic Function v/s lexical category 3. New tags v/s tags from a standard tagger
  • 20.
  • 21. Content Eby's argument on POS Extracting rules from tagged data Mubeena's query on VBZ Multiple Choice Questions
  • 22. kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC Eby's argument on POS
  • 23. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb) Extracting rules from tagged data
  • 24. The algorithm for the lexical item ‘paṭi’ can be described as below: if {# word contains plu. & cas.mar assign tag as noun } elsif {# word takes ten.mar then tag as verb } elsif {# word contains ten.mar + -a/-um then tag as relative participle } elsif {# paṭi comes after a noun then tag as particle } elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb then tag as adverb } elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb then tag as verbal participle } else {# do the 'else' }
  • 26.
  • 27.
  • 28. Multiple Choice Questions 1. Who said that Computational linguistics as the study of computer systems for understanding and generating natural language? 2. ________ is a simple yet powerful programming language with excellent functionality for processing linguistic data. 3. What is NLTK stands for_________? 4. Name any of the scripting language? 5. What does mean by this command 'tr 'a-z' 'A-Z' < inputfile > outputfile' in Linux?
  • 29. 7. What does it mean by tr ’aiou’ e < inputfile > outputfile 8. What does it mean by tr -c 'A-Za-z' '012' <inputfile> outputfile 9. What are the salient features of Corpus? 10. _________ means to divide into parts and describe the relations among the parts. 11. The word parser typically is restricted to the _______ level analyzer.
  • 30. 12. Shallow parsing is also known as __________ parsing. 13. Parsed corpora are sometimes known as ________. 14. What is the simplest n-gram model is called ______ model? 15. What are the six requirements kept in mind while designing NLTK? 16. Could you mention the Microsoft keys functions for consonants in your language? 17. What are the relevance of annotated corpus? 18. Who has argued first for the relevance of shallow parsing? 19. Describe the tags of Chunking with example for each.