SlideShare a Scribd company logo
1 of 31
Download to read offline
Issues in POS tagging
Thennarasu Sakkan
Department of Linguistics
Central University of Kerala
2
Usually one part-of-speech per word.
Resolving lexical ambiguity.
For Example,
avaḷPR_PRP cantaiyilN_NN kattiN_NN
viṟṟāḷV_VM_VF .RD_PUNC
3
For Example,
paccaiJJ miḷakāyilN_NN namakkuP_PRP
teriyātaV_VM_VNF palaJJ uṭalN_NN
nalaJJ/N_NN payaṉkaḷN_NN
aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC
intaDM_DMD paccaiN_NN mikavumRP_INTF
meṇmaiyākaRB irukkiṟatuV_VM_VF
.RD_PUNC
4
One of the main reasons for incorporating a
tagging is to reduce ambiguities.
Fruit flies like a Banana.
Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP
./.
Fruit/NNP Flies/NNP like/VBP a/DT
Banana/NNP ./.
We need to normalization the corpus which makes
the tagging process very complex.
For example : malaiyaṭivārattilN_NN
kōviloṉṟuṇṭuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
‘paṭi’ (படி) in Tamil: A Corpus-based study shows:
a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC
(Noun)
b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF
.RD_PUNC (Verb)
c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD
இதுPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF
.RD_PUNC (Adverb)
e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP
வனதுN_NN 30QT_QTC. (Particle)
f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO
வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC
(Verbal participle)
g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging?
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging
uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF
ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP
avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN
iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF
.RD_PUNC
eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP
ceṉṟārV_VM_VF .RD_PUNC
ippattirikaiN_NN kaṭciyaipN_NN
paravapV_VM_VINF perumJJ
toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN
pirittārV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC
avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN
iruntārV_VM_VF .RD_PUNC
avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC
avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF
vāṅkiṉāṉV_VM_VF .RD_PUNC
aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP
intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN
eḻutiṉārV_VM_VF .RD_PUNC
itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN
nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP
pōṉōmV_VM_VF .RD_PUNC
eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP
nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ
tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP
karutiyiruntārV_VM_VF .RD_PUNC
vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV
,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP
aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC
malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP
eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC
kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN
eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
(Accept, admit)
(eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP
pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP
sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC
aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN
ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS
ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
Proper nouns
nēruN_NNP,
makātmā kāntiN_NNP,
intirākāntiN_NNP,
ampētkārN_NNP
1003-mQT_QTO āṇṭilN_NN irājaN_NNP
rājaN_NNP cōḻaṉN_NNP periyaN_NNP
kōyilaikN_NNP kaṭṭatV_VM_VINF
toṭaṅkiṉāṉV_VM_VF .RD_PUNC
Another tokenization issues concerns with compound words
and multi word expression
māṭṭu vaṇṭiN_NN
kāli ḵpiḷavarN_NN
muṭṭaikkōsN_NN
muṭṭaik N_NN kōsN_NN
vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN
Title of the books, name of the movies S. Ramakrishnan's
Short stories
naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN
appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF
.RD_PUNC
mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP
ceḷantirrājaṉN_NNP
Cinema
nītāṉēPR_PRP eṉPR_PRP poṉJJ
vacantamN_NN
naṭuvulaN_NST koñcamQT_QTF
pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC
nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC
eppaṭi maṉacukkuḷ vantāyV_VM_VF
.RD_PUNC
Abbreviation
pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr.,
Mrs.,
Word ambiguity
āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF,
kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF
paccaiJJ/N_NN
paccaik kāykaṟiN_NN
paccaip poyN_NN
paccai uṭampuN_NN
Ambiguity at structural level
nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN
pārttēṉV_VM_VF .RD_PUNC
nāṉum kumāriyum kumāraip pārttōmā
nāṉ kumār kumāriyōṭu iruppataip pārttēṉ
avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF
.RD_PUNC
In case of a POS tagger, the major issues that need
to be dealt with are:
1. Fineness v/s Coarseness in linguistic analysis
2. Syntactic Function v/s lexical category
3. New tags v/s tags from a standard tagger
Content
Eby's argument on POS
Extracting rules from tagged data
Mubeena's query on VBZ
Multiple Choice Questions
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF
.RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN
,RD_PUNC malaiyāḷaN_NNP
eḻuttāḷarumāṉaRD_UNK pālN_NNP
cakkariyāN_NNP eṉṉaicPR_PRP
cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
Eby's argument on POS
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Extracting rules from tagged data
The algorithm for the lexical item ‘paṭi’ can be described as below:
if {# word contains plu. & cas.mar assign tag as noun
}
elsif {# word takes ten.mar then tag as verb
}
elsif {# word contains ten.mar + -a/-um then tag as relative participle
}
elsif {# paṭi comes after a noun then tag as particle
}
elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb
then tag as adverb
}
elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb
then tag as verbal participle
}
else {# do the 'else'
}
Mubeena's query on VBZ
Multiple Choice Questions
1. Who said that Computational linguistics as the study of
computer systems for understanding and generating
natural language?
2. ________ is a simple yet powerful programming
language with excellent functionality for processing
linguistic data.
3. What is NLTK stands for_________?
4. Name any of the scripting language?
5. What does mean by this command 'tr 'a-z' 'A-Z' <
inputfile > outputfile' in Linux?
7. What does it mean by tr ’aiou’ e < inputfile >
outputfile
8. What does it mean by tr -c 'A-Za-z' '012' <inputfile>
outputfile
9. What are the salient features of Corpus?
10. _________ means to divide into parts and describe
the relations among the parts.
11. The word parser typically is restricted to the _______
level analyzer.
12. Shallow parsing is also known as __________ parsing.
13. Parsed corpora are sometimes known as ________.
14. What is the simplest n-gram model is called ______
model?
15. What are the six requirements kept in mind while designing
NLTK?
16. Could you mention the Microsoft keys functions for
consonants in your language?
17. What are the relevance of annotated corpus?
18. Who has argued first for the relevance of shallow parsing?
19. Describe the tags of Chunking with example for each.
8 issues in pos tagging

More Related Content

What's hot

Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2DigiGurukul
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networksPrasenjit Dey
 
Recursion tree method
Recursion tree methodRecursion tree method
Recursion tree methodRajendran
 
Top Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTop Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTanzeela_Hussain
 
Query processing in Distributed Database System
Query processing in Distributed Database SystemQuery processing in Distributed Database System
Query processing in Distributed Database SystemMeghaj Mallick
 
Lecture 16 memory bounded search
Lecture 16 memory bounded searchLecture 16 memory bounded search
Lecture 16 memory bounded searchHema Kashyap
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptxAcad
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMRajendran
 
Operating system 32 logical versus physical address
Operating system 32 logical versus physical addressOperating system 32 logical versus physical address
Operating system 32 logical versus physical addressVaibhav Khanna
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Gyanmanjari Institute Of Technology
 
FIFO, LRU, OPTIMAL Page Replacement Algorithm
FIFO, LRU, OPTIMAL Page Replacement AlgorithmFIFO, LRU, OPTIMAL Page Replacement Algorithm
FIFO, LRU, OPTIMAL Page Replacement AlgorithmArijitRoy118
 

What's hot (20)

Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networks
 
Scheduling algorithms
Scheduling algorithmsScheduling algorithms
Scheduling algorithms
 
Recursion tree method
Recursion tree methodRecursion tree method
Recursion tree method
 
Time complexity
Time complexityTime complexity
Time complexity
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Top Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTop Down Parsing, Predictive Parsing
Top Down Parsing, Predictive Parsing
 
Query processing in Distributed Database System
Query processing in Distributed Database SystemQuery processing in Distributed Database System
Query processing in Distributed Database System
 
Lecture 16 memory bounded search
Lecture 16 memory bounded searchLecture 16 memory bounded search
Lecture 16 memory bounded search
 
Application Layer
Application Layer Application Layer
Application Layer
 
Empirical analysis
Empirical analysisEmpirical analysis
Empirical analysis
 
Demand paging
Demand pagingDemand paging
Demand paging
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
 
Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
 
Operating system 32 logical versus physical address
Operating system 32 logical versus physical addressOperating system 32 logical versus physical address
Operating system 32 logical versus physical address
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
 
Page Replacement
Page ReplacementPage Replacement
Page Replacement
 
FIFO, LRU, OPTIMAL Page Replacement Algorithm
FIFO, LRU, OPTIMAL Page Replacement AlgorithmFIFO, LRU, OPTIMAL Page Replacement Algorithm
FIFO, LRU, OPTIMAL Page Replacement Algorithm
 

More from ThennarasuSakkan

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introductionThennarasuSakkan
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpusThennarasuSakkan
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpusThennarasuSakkan
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpusThennarasuSakkan
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introductionThennarasuSakkan
 

More from ThennarasuSakkan (9)

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
 
2 why python for nlp
2 why python for nlp2 why python for nlp
2 why python for nlp
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 

Recently uploaded

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 

Recently uploaded (20)

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 

8 issues in pos tagging

  • 1. Issues in POS tagging Thennarasu Sakkan Department of Linguistics Central University of Kerala
  • 2. 2 Usually one part-of-speech per word. Resolving lexical ambiguity. For Example, avaḷPR_PRP cantaiyilN_NN kattiN_NN viṟṟāḷV_VM_VF .RD_PUNC
  • 3. 3 For Example, paccaiJJ miḷakāyilN_NN namakkuP_PRP teriyātaV_VM_VNF palaJJ uṭalN_NN nalaJJ/N_NN payaṉkaḷN_NN aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC intaDM_DMD paccaiN_NN mikavumRP_INTF meṇmaiyākaRB irukkiṟatuV_VM_VF .RD_PUNC
  • 4. 4 One of the main reasons for incorporating a tagging is to reduce ambiguities. Fruit flies like a Banana. Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP ./. Fruit/NNP Flies/NNP like/VBP a/DT Banana/NNP ./.
  • 5. We need to normalization the corpus which makes the tagging process very complex. For example : malaiyaṭivārattilN_NN kōviloṉṟuṇṭuV_VM_VF .RD_PUNC Issues in Tamil POS Tagging?
  • 6. ‘paṭi’ (படி) in Tamil: A Corpus-based study shows: a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC (Noun) b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF .RD_PUNC (Verb) c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD இதுPR_PRP .RD_PUNC (Relative Participle or Adjective) d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF .RD_PUNC (Adverb) e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP வனதுN_NN 30QT_QTC. (Particle) f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC (Verbal participle) g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF .RD_PUNC (Infinitive Verb) Issues in Tamil POS Tagging?
  • 7. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb)
  • 8. Issues in Tamil POS Tagging uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF .RD_PUNC eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP ceṉṟārV_VM_VF .RD_PUNC ippattirikaiN_NN kaṭciyaipN_NN paravapV_VM_VINF perumJJ toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
  • 9. appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN pirittārV_VM_VF .RD_PUNC Issues in Tamil POS Tagging? kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN iruntārV_VM_VF .RD_PUNC avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF vāṅkiṉāṉV_VM_VF .RD_PUNC aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
  • 10. Issues in Tamil POS Tagging? avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN eḻutiṉārV_VM_VF .RD_PUNC itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP pōṉōmV_VM_VF .RD_PUNC eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
  • 11. Issues in Tamil POS Tagging? imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP karutiyiruntārV_VM_VF .RD_PUNC vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV ,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
  • 12. nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC (Accept, admit) (eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
  • 13. nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
  • 14. Proper nouns nēruN_NNP, makātmā kāntiN_NNP, intirākāntiN_NNP, ampētkārN_NNP 1003-mQT_QTO āṇṭilN_NN irājaN_NNP rājaN_NNP cōḻaṉN_NNP periyaN_NNP kōyilaikN_NNP kaṭṭatV_VM_VINF toṭaṅkiṉāṉV_VM_VF .RD_PUNC
  • 15. Another tokenization issues concerns with compound words and multi word expression māṭṭu vaṇṭiN_NN kāli ḵpiḷavarN_NN muṭṭaikkōsN_NN muṭṭaik N_NN kōsN_NN vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN Title of the books, name of the movies S. Ramakrishnan's Short stories naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF .RD_PUNC mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP ceḷantirrājaṉN_NNP
  • 16. Cinema nītāṉēPR_PRP eṉPR_PRP poṉJJ vacantamN_NN naṭuvulaN_NST koñcamQT_QTF pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC eppaṭi maṉacukkuḷ vantāyV_VM_VF .RD_PUNC
  • 17. Abbreviation pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr., Mrs., Word ambiguity āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF, kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF paccaiJJ/N_NN paccaik kāykaṟiN_NN paccaip poyN_NN paccai uṭampuN_NN
  • 18. Ambiguity at structural level nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN pārttēṉV_VM_VF .RD_PUNC nāṉum kumāriyum kumāraip pārttōmā nāṉ kumār kumāriyōṭu iruppataip pārttēṉ avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF .RD_PUNC
  • 19. In case of a POS tagger, the major issues that need to be dealt with are: 1. Fineness v/s Coarseness in linguistic analysis 2. Syntactic Function v/s lexical category 3. New tags v/s tags from a standard tagger
  • 20.
  • 21. Content Eby's argument on POS Extracting rules from tagged data Mubeena's query on VBZ Multiple Choice Questions
  • 22. kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC Eby's argument on POS
  • 23. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb) Extracting rules from tagged data
  • 24. The algorithm for the lexical item ‘paṭi’ can be described as below: if {# word contains plu. & cas.mar assign tag as noun } elsif {# word takes ten.mar then tag as verb } elsif {# word contains ten.mar + -a/-um then tag as relative participle } elsif {# paṭi comes after a noun then tag as particle } elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb then tag as adverb } elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb then tag as verbal participle } else {# do the 'else' }
  • 26.
  • 27.
  • 28. Multiple Choice Questions 1. Who said that Computational linguistics as the study of computer systems for understanding and generating natural language? 2. ________ is a simple yet powerful programming language with excellent functionality for processing linguistic data. 3. What is NLTK stands for_________? 4. Name any of the scripting language? 5. What does mean by this command 'tr 'a-z' 'A-Z' < inputfile > outputfile' in Linux?
  • 29. 7. What does it mean by tr ’aiou’ e < inputfile > outputfile 8. What does it mean by tr -c 'A-Za-z' '012' <inputfile> outputfile 9. What are the salient features of Corpus? 10. _________ means to divide into parts and describe the relations among the parts. 11. The word parser typically is restricted to the _______ level analyzer.
  • 30. 12. Shallow parsing is also known as __________ parsing. 13. Parsed corpora are sometimes known as ________. 14. What is the simplest n-gram model is called ______ model? 15. What are the six requirements kept in mind while designing NLTK? 16. Could you mention the Microsoft keys functions for consonants in your language? 17. What are the relevance of annotated corpus? 18. Who has argued first for the relevance of shallow parsing? 19. Describe the tags of Chunking with example for each.