SlideShare a Scribd company logo
Issues in POS tagging
Thennarasu Sakkan
Department of Linguistics
Central University of Kerala
2
Usually one part-of-speech per word.
Resolving lexical ambiguity.
For Example,
avaḷPR_PRP cantaiyilN_NN kattiN_NN
viṟṟāḷV_VM_VF .RD_PUNC
3
For Example,
paccaiJJ miḷakāyilN_NN namakkuP_PRP
teriyātaV_VM_VNF palaJJ uṭalN_NN
nalaJJ/N_NN payaṉkaḷN_NN
aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC
intaDM_DMD paccaiN_NN mikavumRP_INTF
meṇmaiyākaRB irukkiṟatuV_VM_VF
.RD_PUNC
4
One of the main reasons for incorporating a
tagging is to reduce ambiguities.
Fruit flies like a Banana.
Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP
./.
Fruit/NNP Flies/NNP like/VBP a/DT
Banana/NNP ./.
We need to normalization the corpus which makes
the tagging process very complex.
For example : malaiyaṭivārattilN_NN
kōviloṉṟuṇṭuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
‘paṭi’ (படி) in Tamil: A Corpus-based study shows:
a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC
(Noun)
b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF
.RD_PUNC (Verb)
c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD
இதுPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF
.RD_PUNC (Adverb)
e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP
வனதுN_NN 30QT_QTC. (Particle)
f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO
வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC
(Verbal participle)
g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging?
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Issues in Tamil POS Tagging
uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF
ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP
avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN
iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF
.RD_PUNC
eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP
ceṉṟārV_VM_VF .RD_PUNC
ippattirikaiN_NN kaṭciyaipN_NN
paravapV_VM_VINF perumJJ
toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN
pirittārV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC
avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN
iruntārV_VM_VF .RD_PUNC
avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC
avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF
vāṅkiṉāṉV_VM_VF .RD_PUNC
aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP
intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN
eḻutiṉārV_VM_VF .RD_PUNC
itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN
nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP
pōṉōmV_VM_VF .RD_PUNC
eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP
nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
Issues in Tamil POS Tagging?
imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ
tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP
karutiyiruntārV_VM_VF .RD_PUNC
vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV
,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP
aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC
malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP
eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC
kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN
eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
(Accept, admit)
(eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP
pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP
sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC
aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN
ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF
koṇṭēṉV_VAUX_VF .RD_PUNC
aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS
ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
Proper nouns
nēruN_NNP,
makātmā kāntiN_NNP,
intirākāntiN_NNP,
ampētkārN_NNP
1003-mQT_QTO āṇṭilN_NN irājaN_NNP
rājaN_NNP cōḻaṉN_NNP periyaN_NNP
kōyilaikN_NNP kaṭṭatV_VM_VINF
toṭaṅkiṉāṉV_VM_VF .RD_PUNC
Another tokenization issues concerns with compound words
and multi word expression
māṭṭu vaṇṭiN_NN
kāli ḵpiḷavarN_NN
muṭṭaikkōsN_NN
muṭṭaik N_NN kōsN_NN
vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN
Title of the books, name of the movies S. Ramakrishnan's
Short stories
naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN
appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF
.RD_PUNC
mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP
ceḷantirrājaṉN_NNP
Cinema
nītāṉēPR_PRP eṉPR_PRP poṉJJ
vacantamN_NN
naṭuvulaN_NST koñcamQT_QTF
pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC
nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC
eppaṭi maṉacukkuḷ vantāyV_VM_VF
.RD_PUNC
Abbreviation
pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr.,
Mrs.,
Word ambiguity
āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF,
kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF
paccaiJJ/N_NN
paccaik kāykaṟiN_NN
paccaip poyN_NN
paccai uṭampuN_NN
Ambiguity at structural level
nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN
pārttēṉV_VM_VF .RD_PUNC
nāṉum kumāriyum kumāraip pārttōmā
nāṉ kumār kumāriyōṭu iruppataip pārttēṉ
avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF
.RD_PUNC
In case of a POS tagger, the major issues that need
to be dealt with are:
1. Fineness v/s Coarseness in linguistic analysis
2. Syntactic Function v/s lexical category
3. New tags v/s tags from a standard tagger
Content
Eby's argument on POS
Extracting rules from tagged data
Mubeena's query on VBZ
Multiple Choice Questions
kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN
ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF
.RD_PUNC
eṅkaḷPR_PRP potuN_NN naṇparumN_NN
,RD_PUNC malaiyāḷaN_NNP
eḻuttāḷarumāṉaRD_UNK pālN_NNP
cakkariyāN_NNP eṉṉaicPR_PRP
cantikkumpaṭiV_VM_VNF avaraiPR_PRP
aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
Eby's argument on POS
a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC
(Noun)
b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF
.RD_PUNC (Verb)
c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD
ituPR_PRP .RD_PUNC (Relative Participle or Adjective)
d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF
.RD_PUNC (Adverb)
e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP
vayatuN_NN 30QT_QTC. (Particle)
f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO
vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF
.RD_PUNC
(Verbal participle)
g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF
.RD_PUNC (Infinitive Verb)
Extracting rules from tagged data
The algorithm for the lexical item ‘paṭi’ can be described as below:
if {# word contains plu. & cas.mar assign tag as noun
}
elsif {# word takes ten.mar then tag as verb
}
elsif {# word contains ten.mar + -a/-um then tag as relative participle
}
elsif {# paṭi comes after a noun then tag as particle
}
elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb
then tag as adverb
}
elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb
then tag as verbal participle
}
else {# do the 'else'
}
Mubeena's query on VBZ
Multiple Choice Questions
1. Who said that Computational linguistics as the study of
computer systems for understanding and generating
natural language?
2. ________ is a simple yet powerful programming
language with excellent functionality for processing
linguistic data.
3. What is NLTK stands for_________?
4. Name any of the scripting language?
5. What does mean by this command 'tr 'a-z' 'A-Z' <
inputfile > outputfile' in Linux?
7. What does it mean by tr ’aiou’ e < inputfile >
outputfile
8. What does it mean by tr -c 'A-Za-z' '012' <inputfile>
outputfile
9. What are the salient features of Corpus?
10. _________ means to divide into parts and describe
the relations among the parts.
11. The word parser typically is restricted to the _______
level analyzer.
12. Shallow parsing is also known as __________ parsing.
13. Parsed corpora are sometimes known as ________.
14. What is the simplest n-gram model is called ______
model?
15. What are the six requirements kept in mind while designing
NLTK?
16. Could you mention the Microsoft keys functions for
consonants in your language?
17. What are the relevance of annotated corpus?
18. Who has argued first for the relevance of shallow parsing?
19. Describe the tags of Chunking with example for each.
8 issues in pos tagging

More Related Content

What's hot

NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete Problems
Nikhil Joshi
 
Nlp
NlpNlp
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
Akshaya Arunan
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3
Srimatre K
 
T9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systemsT9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systems
EASSS 2012
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
Aman Sharma
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Eran Zimbler
 
Elements of dynamic programming
Elements of dynamic programmingElements of dynamic programming
Elements of dynamic programming
Tafhim Islam
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
Dr. C.V. Suresh Babu
 
Parsing
ParsingParsing
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
Archana Gopinath
 
Formal Languages and Automata Theory unit 5
Formal Languages and Automata Theory unit 5Formal Languages and Automata Theory unit 5
Formal Languages and Automata Theory unit 5
Srimatre K
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
monircse2
 
Automata presentation turing machine programming techniques
Automata presentation turing machine programming techniquesAutomata presentation turing machine programming techniques
Automata presentation turing machine programming techniques
Basit Hussain
 
Np hard
Np hardNp hard
Np hard
jesal_joshi
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Finite automata
Finite automataFinite automata
Finite automata
Bipul Roy Bpl
 
Push down automata
Push down automataPush down automata
Push down automata
Ratnakar Mikkili
 

What's hot (20)

NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete Problems
 
Nlp
NlpNlp
Nlp
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3
 
T9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systemsT9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systems
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Elements of dynamic programming
Elements of dynamic programmingElements of dynamic programming
Elements of dynamic programming
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 
Parsing
ParsingParsing
Parsing
 
NLP
NLPNLP
NLP
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Formal Languages and Automata Theory unit 5
Formal Languages and Automata Theory unit 5Formal Languages and Automata Theory unit 5
Formal Languages and Automata Theory unit 5
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
Automata presentation turing machine programming techniques
Automata presentation turing machine programming techniquesAutomata presentation turing machine programming techniques
Automata presentation turing machine programming techniques
 
Np hard
Np hardNp hard
Np hard
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Finite automata
Finite automataFinite automata
Finite automata
 
Push down automata
Push down automataPush down automata
Push down automata
 

More from ThennarasuSakkan

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
ThennarasuSakkan
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
ThennarasuSakkan
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
ThennarasuSakkan
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
ThennarasuSakkan
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
ThennarasuSakkan
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
ThennarasuSakkan
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
ThennarasuSakkan
 
2 why python for nlp
2 why python for nlp2 why python for nlp
2 why python for nlp
ThennarasuSakkan
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
ThennarasuSakkan
 

More from ThennarasuSakkan (9)

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
 
2 why python for nlp
2 why python for nlp2 why python for nlp
2 why python for nlp
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 

Recently uploaded

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 

8 issues in pos tagging

  • 1. Issues in POS tagging Thennarasu Sakkan Department of Linguistics Central University of Kerala
  • 2. 2 Usually one part-of-speech per word. Resolving lexical ambiguity. For Example, avaḷPR_PRP cantaiyilN_NN kattiN_NN viṟṟāḷV_VM_VF .RD_PUNC
  • 3. 3 For Example, paccaiJJ miḷakāyilN_NN namakkuP_PRP teriyātaV_VM_VNF palaJJ uṭalN_NN nalaJJ/N_NN payaṉkaḷN_NN aṭaṅkiyuḷḷatuV_VM_VF .RD_PUNC intaDM_DMD paccaiN_NN mikavumRP_INTF meṇmaiyākaRB irukkiṟatuV_VM_VF .RD_PUNC
  • 4. 4 One of the main reasons for incorporating a tagging is to reduce ambiguities. Fruit flies like a Banana. Fruit/NNP flies/VBZ like/IN a/DT Banana/NNP ./. Fruit/NNP Flies/NNP like/VBP a/DT Banana/NNP ./.
  • 5. We need to normalization the corpus which makes the tagging process very complex. For example : malaiyaṭivārattilN_NN kōviloṉṟuṇṭuV_VM_VF .RD_PUNC Issues in Tamil POS Tagging?
  • 6. ‘paṭi’ (படி) in Tamil: A Corpus-based study shows: a. அவன்PR_PRP ஧டினில்N_NN எ஫ி஦ான்V_VM_VF .RD_PUNC (Noun) b. அவள்PR_PRP இன்று கால஬னில்N_NN ஧டித்தாள்V_VM_VF .RD_PUNC (Verb) c. அவர்PR_PRP ஧டித்தV_VM_VNF புத்தகம்N_NN தான்RD_PRD இதுPR_PRP .RD_PUNC (Relative Participle or Adjective) d. அவர்PR_PRP வாய்க்குவந்த஧டிRB ப஧சி஦ார்V_VM_VF .RD_PUNC (Adverb) e. அம்நாவின்N_NN கணக்குப்஧டிN_NN எ஦க்குPR_PRP வனதுN_NN 30QT_QTC. (Particle) f. ஥ான்PR_PRP அந்தDM_DMD ப஥பத்தில்N_NN ஧த்தாம்QT_QTO வகுப்புN_NN ஧டித்துV_VM_VNF வந்பதன்V_VM_VF .RD_PUNC (Verbal participle) g. ஥ான்PR_PRP ஧டிக்கV_VM_VINF ப஧ாப஫ன்V_VM_VF .RD_PUNC (Infinitive Verb) Issues in Tamil POS Tagging?
  • 7. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb)
  • 8. Issues in Tamil POS Tagging uḷnāṭṭuttuṟaiN_NN amaiccarākaRB iruntaV_VM_VNF ciN_NNP .RD_PUNC rājakōpālāccāriyārN_NNP avarkaḷālPR_PRP pārāḷumaṉṟattilN_NN iccaṭṭamN_NN aṟimukappaṭuttappaṭṭatuV_VM_VF .RD_PUNC eṉ.PR_PRP nēruN_NNP avarkaḷPR_PRP ṭelliN_NNP ceṉṟārV_VM_VF .RD_PUNC ippattirikaiN_NN kaṭciyaipN_NN paravapV_VM_VINF perumJJ toṇṭāṟṟiyatuV_VM_VF .RD_PUNC
  • 9. appāN_NN nilattaiN_NN eṭṭuQT_QTC pākaṅkaḷākaN_NN pirittārV_VM_VF .RD_PUNC Issues in Tamil POS Tagging? kallūrikkuN_NN vēkamākaRB vantēṉV_VM_VF .RD_PUNC avarPR_PRP oruQT_QTC nallaJJ āciriyarākaN_NN iruntārV_VM_VF .RD_PUNC avarPR_PRP terutteruvākaRB aḻaintārV_VM_VF .RD_PUNC avaṉPR_PRP nallaJJ paḻamākaN_NN pārttuV_VM_VNF vāṅkiṉāṉV_VM_VF .RD_PUNC aṉṟaiyaN_NST kālakaṭṭattilN_NN ,RD_PUNC nilaN_NN vaḻiyākaPSP intiyāvukkuN_NNP vaḻiN_NN kaṇṭupiṭikkappaṭṭiruntatuV_VM_VF .RD_PUNC
  • 10. Issues in Tamil POS Tagging? avarPR_PRP aḻakāṉaJJ kātalN_NN kavitaikaḷaiN_NN eḻutiṉārV_VM_VF .RD_PUNC itaṟkākaPR_PRP oruQT_QTC muṟaiN_NN nākarkōvilukkumN_NNP tiruvaṉantapurattiṟkumN_NNP pōṉōmV_VM_VF .RD_PUNC eṉPR_PRP utaviyāḷarāṉaN_NN ivaraiPR_PRP nāṉPR_PRP aṉuppukiṟēṉV_VM_VF .RD_PUNC
  • 11. Issues in Tamil POS Tagging? imDM_DMD mātiriyāṉaJJ takutiN_NN etuvumPR_PRQ tēvaiyillaiV_VM_VF eṉṟuCC_CCS_UT avarPR_PRP karutiyiruntārV_VM_VF .RD_PUNC vārttaikaḷaikN_NN koṇṭuPSP caṉamN_NN eḻutupavarākaV_VM_VNV ,RD_PUNC 'rattaN_NNP pācam'N_NNP nāṭakamN_NN mūlamPSP aṟimukamāṉavarV_VM_VF šrītarN_NNP .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC
  • 12. nāṉumPR_PRP koñcamQT_QTF 'RD_PUNC kaḻaṉṟaV_VM_VNF 'RD_PUNC āḷN_NN eṉpatālV_VM_VNG ,RD_PUNC oppukV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC (Accept, admit) (eṭuttukV_VM_VNF koḷḷaV_VM_VINF vēṇṭumV_VAUX_VF
  • 13. nāṅkaḷPR_PRP tiruvaṉantapurattiṟkupN_NNP pōṉapōtuV_VM_VNF ,RD_PUNC citrāñcaliN_NNP sṭūṭiyōN_NNP mūṭiyiruntatuV_VM_VF .RD_PUNC aṅkuN_NST iruntaV_VM_VNF cimeṇṭN_NN peñciṉN_NN ōrQT_QTC ōrattilN_NN nāṉPR_PRP uṭkārntuV_VM_VNF koṇṭēṉV_VAUX_VF .RD_PUNC aṭūrN_NNP kōpālakiruṣṇaṉN_NNP vēṟuCC_CCS ūrilN_NN illaiV_VM_VF/RP_NEG .RD_PUNC kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC
  • 14. Proper nouns nēruN_NNP, makātmā kāntiN_NNP, intirākāntiN_NNP, ampētkārN_NNP 1003-mQT_QTO āṇṭilN_NN irājaN_NNP rājaN_NNP cōḻaṉN_NNP periyaN_NNP kōyilaikN_NNP kaṭṭatV_VM_VINF toṭaṅkiṉāṉV_VM_VF .RD_PUNC
  • 15. Another tokenization issues concerns with compound words and multi word expression māṭṭu vaṇṭiN_NN kāli ḵpiḷavarN_NN muṭṭaikkōsN_NN muṭṭaik N_NN kōsN_NN vēḷāṇmaiN_NN uṟpattip N_NN poruḷkaḷ N_NN Title of the books, name of the movies S. Ramakrishnan's Short stories naṭantuV_VM_VNF cellumV_VM_VNF nīrūṟṟuN_NN appōtum kaṭalN_NN pārttukkoṇṭiruntatuV_VM_VF .RD_PUNC mīṇṭumRB varuvēṉV_VM_VF by intirāN_NNP ceḷantirrājaṉN_NNP
  • 16. Cinema nītāṉēPR_PRP eṉPR_PRP poṉJJ vacantamN_NN naṭuvulaN_NST koñcamQT_QTF pakkattaN_NN kāṇōmV_VM_VF .RD_PUNC nāṉēPR_PRP varuvēṉV_VM_VF .RD_PUNC eppaṭi maṉacukkuḷ vantāyV_VM_VF .RD_PUNC
  • 17. Abbreviation pa.ja.ka, a.ti.mu.ka., ti.mu.ka. i.ā.pa., Dr. Mr., Mrs., Word ambiguity āṭuN_NN/V_VM_VF, kaṭṭuN_NN/V_VM_VF, kātaliN_NN/V_VM_VF, colN_NN/V_VM_VF paccaiJJ/N_NN paccaik kāykaṟiN_NN paccaip poyN_NN paccai uṭampuN_NN
  • 18. Ambiguity at structural level nāṉPR_PRP kumāriyōṭuN_NNP kumāraiN_NN pārttēṉV_VM_VF .RD_PUNC nāṉum kumāriyum kumāraip pārttōmā nāṉ kumār kumāriyōṭu iruppataip pārttēṉ avaṉPR_PRP neytāṉV_VM_VF viṟṟāṉV_VM_VF .RD_PUNC
  • 19. In case of a POS tagger, the major issues that need to be dealt with are: 1. Fineness v/s Coarseness in linguistic analysis 2. Syntactic Function v/s lexical category 3. New tags v/s tags from a standard tagger
  • 20.
  • 21. Content Eby's argument on POS Extracting rules from tagged data Mubeena's query on VBZ Multiple Choice Questions
  • 22. kaviyaracuN_NNP vairamuttuviṉN_NNP kārN_NN ṭaivarN_NN kaṇṇaṉN_NNP coṉṉārV_VM_VF .RD_PUNC eṅkaḷPR_PRP potuN_NN naṇparumN_NN ,RD_PUNC malaiyāḷaN_NNP eḻuttāḷarumāṉaRD_UNK pālN_NNP cakkariyāN_NNP eṉṉaicPR_PRP cantikkumpaṭiV_VM_VNF avaraiPR_PRP aṉuppiyirukkiṟārV_VM_VF .RD_PUNC Eby's argument on POS
  • 23. a. avaṉPR_PRP paṭiyilN_NN eṟiṉāṉV_VM_VF .RD_PUNC (Noun) b. avaḷPR_PRP iṉṟu kālaiyilN_NN paṭittāḷV_VM_VF .RD_PUNC (Verb) c. avarPR_PRP paṭittaV_VM_VNF puttakamN_NN tāṉRD_PRD ituPR_PRP .RD_PUNC (Relative Participle or Adjective) d. avarPR_PRP vāykkuvantapaṭiRB pēciṉārV_VM_VF .RD_PUNC (Adverb) e. ammāviṉN_NN kaṇakkuppaṭiN_NN eṉakkuPR_PRP vayatuN_NN 30QT_QTC. (Particle) f. nāṉPR_PRP antaDM_DMD nērattilN_NN pattāmQT_QTO vakuppuN_NN paṭittuV_VM_VNF vantēṉV_VM_VF .RD_PUNC (Verbal participle) g. nāṉPR_PRP paṭikkaV_VM_VINF pōṟēṉV_VM_VF .RD_PUNC (Infinitive Verb) Extracting rules from tagged data
  • 24. The algorithm for the lexical item ‘paṭi’ can be described as below: if {# word contains plu. & cas.mar assign tag as noun } elsif {# word takes ten.mar then tag as verb } elsif {# word contains ten.mar + -a/-um then tag as relative participle } elsif {# paṭi comes after a noun then tag as particle } elsif {# paṭi occurs after a verb + pst.tns + -a followed by a finite verb then tag as adverb } elsif {# paṭi comes after a verb + pst.tns + -u followed by a finite verb then tag as verbal participle } else {# do the 'else' }
  • 26.
  • 27.
  • 28. Multiple Choice Questions 1. Who said that Computational linguistics as the study of computer systems for understanding and generating natural language? 2. ________ is a simple yet powerful programming language with excellent functionality for processing linguistic data. 3. What is NLTK stands for_________? 4. Name any of the scripting language? 5. What does mean by this command 'tr 'a-z' 'A-Z' < inputfile > outputfile' in Linux?
  • 29. 7. What does it mean by tr ’aiou’ e < inputfile > outputfile 8. What does it mean by tr -c 'A-Za-z' '012' <inputfile> outputfile 9. What are the salient features of Corpus? 10. _________ means to divide into parts and describe the relations among the parts. 11. The word parser typically is restricted to the _______ level analyzer.
  • 30. 12. Shallow parsing is also known as __________ parsing. 13. Parsed corpora are sometimes known as ________. 14. What is the simplest n-gram model is called ______ model? 15. What are the six requirements kept in mind while designing NLTK? 16. Could you mention the Microsoft keys functions for consonants in your language? 17. What are the relevance of annotated corpus? 18. Who has argued first for the relevance of shallow parsing? 19. Describe the tags of Chunking with example for each.