SlideShare a Scribd company logo
1 of 21
Download to read offline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Developing Dependency Parsers for Tamil
-
March 27, 2021
K. Sarveswaran (Sarves)
University of Moratuwa
& University of Jaffna
Sri Lanka.
iamsarves@gmail.com
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 1 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Overview
1 Introduction
2 Background
Natural Language Grammars
Syntactic parsing
Treebanks
Universal Dependency Treebank
Dependency Parsers
Approaches for Developing parsers
3 Dependency parsing of Tamil
4 How did I develop parsers?
5 ThamizhiPOSt: Part of Speech tagger
6 ThamizhiMorph: Morphological Analyser and Generator
7 LFG-based grammar for Tamil
8 UD-based grammar for Tamil
9 Creation of Treebank
10 Conclusion
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 2 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
language processing technologies are now part of our everyday life
tech giants are investing a lot on language technologies
interests towards local language computing are increasing in recent
times
Tamil still can be considered as a low-resource language, based
publicly available on number of usable tools and resources
machine learning/deep learning approaches are growing very fast
dependency parsers are very crucial tools for syntactic analysis
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 3 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Natural language grammars
phrase structure grammar (constituency grammar / context-free
grammar / generative grammar) and dependency grammar are the
two popular grammars used to model natural languages1
there are also several derivations of these two, for instance Lexical
Functional Grammar
phrase structure grammar - good for languages like English, where the
order of words matter
dependency grammar - good for languages that are morphologically
rich and have relatively free word order1
1 Jurafsky, D. and Martin, J.H., 2008. Speech and Language Processing: An introduction to speech recognition, computational
linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 4 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Syntactic parsing
mapping a string of words to its parse tree is called syntactic parsing2
2
parse=to separate a sentence into grammatical parts - Cambridge dictionary
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 5 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Treebanks
bank of syntactically (may be also semantically) annotated sentences
(syntactically parsed sentences)
for instance:
Penn Treebank3
- a phrase structure treebank
Universal Dependency Treebank4
- a dependency treebank
3
https://catalog.ldc.upenn.edu/LDC99T42
4
https://universaldependencies.org/
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 6 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Universal Dependency (UD) Treebank
there are several schemes for annotating dependencies: Anncora5,
PDT6
Universal Dependency Treebank7 is a widely used scheme for machine
language processing
cross-linguistically consistent treebank annotation for many languages
facilitate multilingual parser development, cross-lingual learning, and
parsing research from a language typology perspective
183 treebanks in 104 languages, as of November 2020
widely used for parsing; there are shared tasks and workshops organised
annually
5 Bharati, A., Sangal, R., Sharma, D.M. and Bai, L., 2006. Anncorra: Annotating corpora guidelines for pos and chunk annotation
for indian languages. LTRC-TR31, pp.1-38.
6 Hajic, J., Vidová-Hladká, B. and Pajas, P., 2001, December. The prague dependency treebank: Annotation structure and
support. In Proceedings of the IRCS Workshop on Linguistic Databases (pp. 105-114).
7 Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira,
N. and Tsarfaty, R., 2016, May. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC’16) (pp. 1659-1666).
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 7 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Universal Dependency Treebank
consists POS, Lemma, Morphology, and Dependency annotations
arranged in CoNLL-U format, as shown in Figure-1
scheme is amended to accommodate language change
Figure-1
Figure-2
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 8 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Dependency Parser
A software which gives syntactic parses of a given sentence based on a
dependency formalism.
Why:
useful for the development of applications for : grammar checking,
semantic interpretation, question-answer, and machine translation
useful to study the structure of languages / diachronic and synchronic
changes
Challenges:
one needs a lot of linguistic knowledge to create treebanks
time consuming, usually (gold) treebank are created by hand
there are still a lot of debates on syntax, even for English 8
ambiguities are always a problem:
attachment: Ram saw Sita [with a telescope]
coordination: old women and men
8
https://universaldependencies.org/workgroups/core.html
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 9 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Treebanks - Tamil
only one treebank is there (publicly available): Tamil PDT9
TamilPDT then also migrated to UD (called TamilTTB) in
November, 2015, using scripts.
since then no change has been done
used by several non-Tamil teams for parsing (IWPT202010)
TamilTTB has several issues:
tokenisation: for instance, words are broken inappropriately
dependency issues: for instance, datives can be a subject, oblique,
indirect object in Tamil. However, it is mostly marked as object
9
Ramasamy, L. and Žabokrtský, Z., 2011, February. Tamil dependency parsing: results using rule based and corpus based
approaches. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 82-95). Springer,
Berlin, Heidelberg.
10
https://universaldependencies.org/iwpt20/enhancements_in_treebanks.html
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 10 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Approaches for Developing parsers
rule-based approach:
need to write a lot of rules
success and the coverage is heavily depended on the lexicon
useful for (small) domain specific parsing
hybrid-approach:
create annotated data
train a computer program with annotated data
annotate more data using the trained computer program, and do this
iteratively until get a good accuracy
useful for languages like Tamil where we do not have a lot annotated
data
robust than rule-based approach
machine learning based / unsupervised learning:
research is still in its preliminary stage
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 11 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Dependency Parser for Tamil
a shallow parser for Tamil; it identifies the phrases with a f-measure
of 66.6, tool not found11
a dependency parser for Tamil; score 57.50, no data/no tools found.
It uses own specification for annotation12
a dependency parser to parse an ancient poetic text in Tamil; no
results report, no tools found13
a SVM based dependency parser; unlabelled assigned score of 76.26;
no tools found14
There is a survey paper on parsing in Tamil15
11Ariaratnam, I., Weerasinghe, A.R. and Liyanage, C., 2014, December. A shallow parser for Tamil. In 2014 14th International
Conference on Advances in ICT for Emerging Regions (ICTer) (pp. 197-203). IEEE.
12Selvam, M., Natarajan, A.M. and Thangarajan, R., 2009. Structural parsing of natural language text in Tamil Language using
dependency model. International Journal of Computer Processing of Languages, 22(02n03), pp.237-256.
13Dhanalakshmi, V., Kumar, M.A. and Murugesan, C., 2012. Dependency Parser for Tamil classical literature-Kurunthokai.
INFITT
14
Green, N., Ramasamy, L. and Žabokrtský, Z., 2012. Using an SVM ensemble system for improved Tamil dependency parsing. In
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
(pp. 72-77).
15
Rajendran, S., 2006. Parsing in tamil: Present state of art. Language in India, 6, p.8.
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 12 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
How did I develop parsers?
this is the context in which I started developing a dependency parser
for Tamil
tried two approaches to develop a parser for Tamil:
Universal Dependency parser (UD-based) using hybrid-approach
Lexical Functional Grammar based parser (LFG-based) - rule-based
approach
also developed support tools to ease the development process of
UD-based and LFG-based development:
Part of Speech (POS) tagger (ThamizhiPOSt)
Morphological analyser (ThamizhiMorph)
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 13 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Part of Speech Tagger (ThamizhiPOSt)
there are several POS-tagsets available: Universal POS (UPOS),
Amrita, Bureau of Indian Standards (BIS)
available data:
AU-KBC Ponniyin Selvan corpus16
(BIS)
Amrita tagged corpus17
(Amrita)
TDIL has a small tagged corpus for non-Indians (BIS)
TamilTTB (Universal Dependency Treebank) has around 9K tokens
(UPOS)
ThamizhiPOSt
used UPOS - this is what used in Universal Dependency
developed using machine learning approach
converted Amrita to UPOS, and trained the program
accuracy - 93.57%18
16
http://www.au-kbc.org/nlp/corpusrelease.html
17
https://www.amrita.edu/publication/tamil-pos-tagging-using-linear-programming
18Sarveswaran, K, Gihan Dias. 2020. ThamizhiUDp: A Dependency Parser for Tamil. In Proceedings of the 17th International
Conference on Natural Language Processing (ICON-2020), IIT Patna, India.
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 14 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ThamizhiMorph: Morphological Analyser and Generator
a rule-based approach, used nominal and verbal paradigms to write
rules using a Finite-State Transducer
mostly handles inflectional morphology
paradigms:
for verbal paradigms: used Graul’s paradigm19
collected verb roots from various sources, primarily from Irākavaiyaṅkār
20
conjugational forms are obtained from various sources, including from
Crea21
auxiliary forms were taken from Lehmann22
at present:
there are 3300+ base forms and 300+ conjugations for each base
generated 1.4M+ simple and 50M+ complex surface forms23
19
K. Graul,Outline of Tamil grammar. Leipzip University, 1855
20
M. Irākavaiyaṅkār,’Viaittiripu viḷakkam’ (conjugation of Tamil verbs) (in Tamil). Eighty year anniversary publication, 1958.
21
E. Annamalai and Crea Team, A handbook of Tamil Verbal Conjugations, MCNeil Technologies, 2009
22
Lehmann, Thomas. 1993.A Grammar of Modern Tamil. Pondicherry Institute of Linguistics and Culture, India.
23
https://www.kaggle.com/sarves/tamilverbs
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 15 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
LFG-based grammar for Tamil
Lexical Functional Grammar a constraint-based grammar, a
generative grammar24
goal of combining linguistic sophistication with computational
implementability
primarily has a constituency and functional structures; now also
extended to capture more complex analysis, like semantics, prosody
etc.
constituency structure (c-structure) - captures surface structure, word
order etc.
functional structure (f-structure) - captures the functions, constraints,
argument structure etc.
at present:
it is developed based on 150 sentences taken from ParGram project25
and Grade-1 Tamil text book
used ThamizhiMorph to generate lexicon
available here: https://clarino.uib.no/iness/xle-web
24Kaplan, R.M. and Bresnan, J., 1981. Lexical-functional grammar: A formal system for grammatical representation. Mas-
sachusetts Institute Of Technology, Center For Cognitive Science.
25Butt, Miriam, Tracy Holloway King, Maria-Eugenia Nino, and Frederique Segond. 1999. A Grammar Writer’s Cookbook.
Stanford: CSLI Publications.
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 16 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
LFG parsing - examples
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 17 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UD-based grammar for Tamil
used a hybrid approach to develop the parser
created UD annotated treebank, using ThamizhiPOSt,
ThamizhiMorph and by hand
iteratively trained the parser using machine learning approach
also tried multilingual learning, along with Telugu and Hindi
training a parser is a structured process, as below:
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 18 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Creation of Treebanks
Tamil MWTT: (Together with Prof. Prameswari, CALTS)
Tamil Modern Written Tamil Treebank, used 536 sentences from a
book called ”Grammar of Modern Tamil” - by Thoman Lehmann
Manually (mostly) annotated dependency information
available in UD repository26; work in progress
Tamil ThamizhiTB:
annotated 1300 sentences taken from online sources (some what
balanced, taken from different type of sources), used hybrid approach
(Human + Machine)
different syntactical constructions are considered
26
https://github.com/UniversalDependencies/UD_Tamil-MWTT/tree/master
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 19 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Performance
at present:
have a parser, ThamizhiUDp, with the accuracy of 79%
covers simple structures, except questions
available through ThamizhiLIP
Also tried, multilingual training with Hindi and Telugu. Multilingual
learning is a technique used when there are less data.
Dataset LAS (F1 score)
Hindi27
(1500 sentences) 76.74
Telugu28
(1050 sentences) 75.73
27
https://github.com/UniversalDependencies/UD_Hindi-HDTB/tree/master
28
https://github.com/UniversalDependencies/UD_Telugu-MTG/tree/master
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 20 / 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Finally:
initial and usable versions of POS tagger, Morphological
analyser/generator, and Dependency parsers are available publicly
rule-based LFG parser and machine learning based UD parsers are
useful devices for linguistic and computational analysis of our
languages
need more data to improve these tools
need a lot more linguistic help
everything open source for others to build upon; please make use of
them
conducting a workshop on UD treebank annotation on 8-10 April,
2021.
Thank you.
K. Sarveswaran (Sarves)
iamsarves@gmail.com
K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 21 / 21

More Related Content

What's hot

монголын нийгэм эдийн засгийн газарзүй
монголын нийгэм эдийн засгийн газарзүймонголын нийгэм эдийн засгийн газарзүй
монголын нийгэм эдийн засгийн газарзүйbatsuuri
 
Нэрийн хуудасны талаар
Нэрийн хуудасны талаарНэрийн хуудасны талаар
Нэрийн хуудасны талаарTemuulen Nyamdorj
 
үг бүлэг сэдвийн давтлага хичээл
үг бүлэг сэдвийн давтлага хичээлүг бүлэг сэдвийн давтлага хичээл
үг бүлэг сэдвийн давтлага хичээлOyuhai1127
 
анхны сэтгэгдэл
анхны сэтгэгдэланхны сэтгэгдэл
анхны сэтгэгдэлBayarmaa Anu
 
Дутмаг хувилах нэр
Дутмаг хувилах нэрДутмаг хувилах нэр
Дутмаг хувилах нэрGe Go
 
Uil ugiin nairuulga 4
Uil ugiin nairuulga 4Uil ugiin nairuulga 4
Uil ugiin nairuulga 4oyunaadorj
 
Lekts 10 Бутралын түүх
Lekts 10 Бутралын түүхLekts 10 Бутралын түүх
Lekts 10 Бутралын түүхDamdin Serdaram
 
монголчуудын угсаа гарал
монголчуудын угсаа гаралмонголчуудын угсаа гарал
монголчуудын угсаа гаралtungalag
 
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docx
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docxНОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docx
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docxRAYB
 
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгаа
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгааМэдээлэл боловсруулах аргууд - Нийгмийн судалгаа
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгааХишигтөгс А.
 
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1Niigem sudlal-zuun-dasgal-daalgavar-azhil-1
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1anartseeveldorj
 
имиж ба нийгмийн харилцаа, имижийн ангилал
имиж ба нийгмийн харилцаа, имижийн ангилалимиж ба нийгмийн харилцаа, имижийн ангилал
имиж ба нийгмийн харилцаа, имижийн ангилалyivo1004
 

What's hot (20)

монголын нийгэм эдийн засгийн газарзүй
монголын нийгэм эдийн засгийн газарзүймонголын нийгэм эдийн засгийн газарзүй
монголын нийгэм эдийн засгийн газарзүй
 
11. uil ug+
11. uil ug+11. uil ug+
11. uil ug+
 
Tileubek
TileubekTileubek
Tileubek
 
Linguagem literária e linguagem não literária
Linguagem literária e linguagem não literáriaLinguagem literária e linguagem não literária
Linguagem literária e linguagem não literária
 
MHHS12
MHHS12MHHS12
MHHS12
 
Нэрийн хуудасны талаар
Нэрийн хуудасны талаарНэрийн хуудасны талаар
Нэрийн хуудасны талаар
 
үг бүлэг сэдвийн давтлага хичээл
үг бүлэг сэдвийн давтлага хичээлүг бүлэг сэдвийн давтлага хичээл
үг бүлэг сэдвийн давтлага хичээл
 
үзүүлэн
үзүүлэнүзүүлэн
үзүүлэн
 
анхны сэтгэгдэл
анхны сэтгэгдэланхны сэтгэгдэл
анхны сэтгэгдэл
 
Дутмаг хувилах нэр
Дутмаг хувилах нэрДутмаг хувилах нэр
Дутмаг хувилах нэр
 
Mongolchuudiin garal ugsaa
Mongolchuudiin garal ugsaaMongolchuudiin garal ugsaa
Mongolchuudiin garal ugsaa
 
Uil ugiin nairuulga 4
Uil ugiin nairuulga 4Uil ugiin nairuulga 4
Uil ugiin nairuulga 4
 
Lekts 10 Бутралын түүх
Lekts 10 Бутралын түүхLekts 10 Бутралын түүх
Lekts 10 Бутралын түүх
 
монголчуудын угсаа гарал
монголчуудын угсаа гаралмонголчуудын угсаа гарал
монголчуудын угсаа гарал
 
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docx
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docxНОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docx
НОМЫН ТЭМДЭГЛЭЛ - Ичиго Ичие.docx
 
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгаа
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгааМэдээлэл боловсруулах аргууд - Нийгмийн судалгаа
Мэдээлэл боловсруулах аргууд - Нийгмийн судалгаа
 
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1Niigem sudlal-zuun-dasgal-daalgavar-azhil-1
Niigem sudlal-zuun-dasgal-daalgavar-azhil-1
 
Altai 1
Altai 1Altai 1
Altai 1
 
Elementos de textualidade
Elementos de textualidadeElementos de textualidade
Elementos de textualidade
 
имиж ба нийгмийн харилцаа, имижийн ангилал
имиж ба нийгмийн харилцаа, имижийн ангилалимиж ба нийгмийн харилцаа, имижийн ангилал
имиж ба нийгмийн харилцаа, имижийн ангилал
 

Similar to Developing Dependency Parsers for Tamil

Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Nakul Sharma
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKijnlc
 
Natural Language Interface for Java Programming: Survey
Natural Language Interface for Java Programming: SurveyNatural Language Interface for Java Programming: Survey
Natural Language Interface for Java Programming: Surveyrahulmonikasharma
 
September 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language ComputingSeptember 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language Computingkevig
 
Parsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function TaggingParsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function Taggingkevig
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionaryEditor IJMTER
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognitionVikas Dongre
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineeringNakul Sharma
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
 
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017kevig
 
Natural Language Processing for Tamil and Sinhala
Natural Language Processing for Tamil and SinhalaNatural Language Processing for Tamil and Sinhala
Natural Language Processing for Tamil and SinhalaKengatharaiyer Sarveswaran
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 

Similar to Developing Dependency Parsers for Tamil (20)

Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
 
Natural Language Interface for Java Programming: Survey
Natural Language Interface for Java Programming: SurveyNatural Language Interface for Java Programming: Survey
Natural Language Interface for Java Programming: Survey
 
September 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language ComputingSeptember 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language Computing
 
Parsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function TaggingParsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function Tagging
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognition
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
 
Natural Language Processing for Tamil and Sinhala
Natural Language Processing for Tamil and SinhalaNatural Language Processing for Tamil and Sinhala
Natural Language Processing for Tamil and Sinhala
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 

More from Kengatharaiyer Sarveswaran

Department of Education - Northern Province - Grade 5 paper
Department of Education - Northern Province - Grade 5 paperDepartment of Education - Northern Province - Grade 5 paper
Department of Education - Northern Province - Grade 5 paperKengatharaiyer Sarveswaran
 
Concept paper for Educational Management Information System
Concept paper for Educational Management Information SystemConcept paper for Educational Management Information System
Concept paper for Educational Management Information SystemKengatharaiyer Sarveswaran
 
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்Kengatharaiyer Sarveswaran
 
Teaching and Learning in Northern Province, Sri Lanka
Teaching and Learning in Northern Province, Sri LankaTeaching and Learning in Northern Province, Sri Lanka
Teaching and Learning in Northern Province, Sri LankaKengatharaiyer Sarveswaran
 

More from Kengatharaiyer Sarveswaran (14)

Thamizhi Language Processing Tools
Thamizhi Language Processing ToolsThamizhi Language Processing Tools
Thamizhi Language Processing Tools
 
Department of Education - Northern Province - Grade 5 paper
Department of Education - Northern Province - Grade 5 paperDepartment of Education - Northern Province - Grade 5 paper
Department of Education - Northern Province - Grade 5 paper
 
Digital transformation and the SME sector
Digital transformation and the SME sectorDigital transformation and the SME sector
Digital transformation and the SME sector
 
IP and ICT - Intro
IP and ICT - IntroIP and ICT - Intro
IP and ICT - Intro
 
Concept paper for Educational Management Information System
Concept paper for Educational Management Information SystemConcept paper for Educational Management Information System
Concept paper for Educational Management Information System
 
Concept paper - DIY Innovation Center
Concept paper - DIY Innovation CenterConcept paper - DIY Innovation Center
Concept paper - DIY Innovation Center
 
Presentation - CTC
Presentation - CTCPresentation - CTC
Presentation - CTC
 
Being 21st century teacher and e-Learning
Being 21st century teacher and e-LearningBeing 21st century teacher and e-Learning
Being 21st century teacher and e-Learning
 
Using the Internet for Learning
Using the Internet for LearningUsing the Internet for Learning
Using the Internet for Learning
 
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்
21ம் நூற்றாண்டில் இணையக் கல்வியின் முக்கியத்துவம்
 
Teaching and Learning in Northern Province, Sri Lanka
Teaching and Learning in Northern Province, Sri LankaTeaching and Learning in Northern Province, Sri Lanka
Teaching and Learning in Northern Province, Sri Lanka
 
Introduction to Electronic Learning
Introduction to Electronic LearningIntroduction to Electronic Learning
Introduction to Electronic Learning
 
Joomla Manual in Tamil
Joomla Manual in TamilJoomla Manual in Tamil
Joomla Manual in Tamil
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Developing Dependency Parsers for Tamil

  • 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Developing Dependency Parsers for Tamil - March 27, 2021 K. Sarveswaran (Sarves) University of Moratuwa & University of Jaffna Sri Lanka. iamsarves@gmail.com K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 1 / 21
  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1 Introduction 2 Background Natural Language Grammars Syntactic parsing Treebanks Universal Dependency Treebank Dependency Parsers Approaches for Developing parsers 3 Dependency parsing of Tamil 4 How did I develop parsers? 5 ThamizhiPOSt: Part of Speech tagger 6 ThamizhiMorph: Morphological Analyser and Generator 7 LFG-based grammar for Tamil 8 UD-based grammar for Tamil 9 Creation of Treebank 10 Conclusion K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 2 / 21
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction language processing technologies are now part of our everyday life tech giants are investing a lot on language technologies interests towards local language computing are increasing in recent times Tamil still can be considered as a low-resource language, based publicly available on number of usable tools and resources machine learning/deep learning approaches are growing very fast dependency parsers are very crucial tools for syntactic analysis K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 3 / 21
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural language grammars phrase structure grammar (constituency grammar / context-free grammar / generative grammar) and dependency grammar are the two popular grammars used to model natural languages1 there are also several derivations of these two, for instance Lexical Functional Grammar phrase structure grammar - good for languages like English, where the order of words matter dependency grammar - good for languages that are morphologically rich and have relatively free word order1 1 Jurafsky, D. and Martin, J.H., 2008. Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall. K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 4 / 21
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic parsing mapping a string of words to its parse tree is called syntactic parsing2 2 parse=to separate a sentence into grammatical parts - Cambridge dictionary K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 5 / 21
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Treebanks bank of syntactically (may be also semantically) annotated sentences (syntactically parsed sentences) for instance: Penn Treebank3 - a phrase structure treebank Universal Dependency Treebank4 - a dependency treebank 3 https://catalog.ldc.upenn.edu/LDC99T42 4 https://universaldependencies.org/ K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 6 / 21
  • 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Universal Dependency (UD) Treebank there are several schemes for annotating dependencies: Anncora5, PDT6 Universal Dependency Treebank7 is a widely used scheme for machine language processing cross-linguistically consistent treebank annotation for many languages facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective 183 treebanks in 104 languages, as of November 2020 widely used for parsing; there are shared tasks and workshops organised annually 5 Bharati, A., Sangal, R., Sharma, D.M. and Bai, L., 2006. Anncorra: Annotating corpora guidelines for pos and chunk annotation for indian languages. LTRC-TR31, pp.1-38. 6 Hajic, J., Vidová-Hladká, B. and Pajas, P., 2001, December. The prague dependency treebank: Annotation structure and support. In Proceedings of the IRCS Workshop on Linguistic Databases (pp. 105-114). 7 Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N. and Tsarfaty, R., 2016, May. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1659-1666). K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 7 / 21
  • 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Universal Dependency Treebank consists POS, Lemma, Morphology, and Dependency annotations arranged in CoNLL-U format, as shown in Figure-1 scheme is amended to accommodate language change Figure-1 Figure-2 K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 8 / 21
  • 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependency Parser A software which gives syntactic parses of a given sentence based on a dependency formalism. Why: useful for the development of applications for : grammar checking, semantic interpretation, question-answer, and machine translation useful to study the structure of languages / diachronic and synchronic changes Challenges: one needs a lot of linguistic knowledge to create treebanks time consuming, usually (gold) treebank are created by hand there are still a lot of debates on syntax, even for English 8 ambiguities are always a problem: attachment: Ram saw Sita [with a telescope] coordination: old women and men 8 https://universaldependencies.org/workgroups/core.html K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 9 / 21
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Treebanks - Tamil only one treebank is there (publicly available): Tamil PDT9 TamilPDT then also migrated to UD (called TamilTTB) in November, 2015, using scripts. since then no change has been done used by several non-Tamil teams for parsing (IWPT202010) TamilTTB has several issues: tokenisation: for instance, words are broken inappropriately dependency issues: for instance, datives can be a subject, oblique, indirect object in Tamil. However, it is mostly marked as object 9 Ramasamy, L. and Žabokrtský, Z., 2011, February. Tamil dependency parsing: results using rule based and corpus based approaches. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 82-95). Springer, Berlin, Heidelberg. 10 https://universaldependencies.org/iwpt20/enhancements_in_treebanks.html K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 10 / 21
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Approaches for Developing parsers rule-based approach: need to write a lot of rules success and the coverage is heavily depended on the lexicon useful for (small) domain specific parsing hybrid-approach: create annotated data train a computer program with annotated data annotate more data using the trained computer program, and do this iteratively until get a good accuracy useful for languages like Tamil where we do not have a lot annotated data robust than rule-based approach machine learning based / unsupervised learning: research is still in its preliminary stage K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 11 / 21
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependency Parser for Tamil a shallow parser for Tamil; it identifies the phrases with a f-measure of 66.6, tool not found11 a dependency parser for Tamil; score 57.50, no data/no tools found. It uses own specification for annotation12 a dependency parser to parse an ancient poetic text in Tamil; no results report, no tools found13 a SVM based dependency parser; unlabelled assigned score of 76.26; no tools found14 There is a survey paper on parsing in Tamil15 11Ariaratnam, I., Weerasinghe, A.R. and Liyanage, C., 2014, December. A shallow parser for Tamil. In 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) (pp. 197-203). IEEE. 12Selvam, M., Natarajan, A.M. and Thangarajan, R., 2009. Structural parsing of natural language text in Tamil Language using dependency model. International Journal of Computer Processing of Languages, 22(02n03), pp.237-256. 13Dhanalakshmi, V., Kumar, M.A. and Murugesan, C., 2012. Dependency Parser for Tamil classical literature-Kurunthokai. INFITT 14 Green, N., Ramasamy, L. and Žabokrtský, Z., 2012. Using an SVM ensemble system for improved Tamil dependency parsing. In Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (pp. 72-77). 15 Rajendran, S., 2006. Parsing in tamil: Present state of art. Language in India, 6, p.8. K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 12 / 21
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How did I develop parsers? this is the context in which I started developing a dependency parser for Tamil tried two approaches to develop a parser for Tamil: Universal Dependency parser (UD-based) using hybrid-approach Lexical Functional Grammar based parser (LFG-based) - rule-based approach also developed support tools to ease the development process of UD-based and LFG-based development: Part of Speech (POS) tagger (ThamizhiPOSt) Morphological analyser (ThamizhiMorph) K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 13 / 21
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part of Speech Tagger (ThamizhiPOSt) there are several POS-tagsets available: Universal POS (UPOS), Amrita, Bureau of Indian Standards (BIS) available data: AU-KBC Ponniyin Selvan corpus16 (BIS) Amrita tagged corpus17 (Amrita) TDIL has a small tagged corpus for non-Indians (BIS) TamilTTB (Universal Dependency Treebank) has around 9K tokens (UPOS) ThamizhiPOSt used UPOS - this is what used in Universal Dependency developed using machine learning approach converted Amrita to UPOS, and trained the program accuracy - 93.57%18 16 http://www.au-kbc.org/nlp/corpusrelease.html 17 https://www.amrita.edu/publication/tamil-pos-tagging-using-linear-programming 18Sarveswaran, K, Gihan Dias. 2020. ThamizhiUDp: A Dependency Parser for Tamil. In Proceedings of the 17th International Conference on Natural Language Processing (ICON-2020), IIT Patna, India. K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 14 / 21
  • 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ThamizhiMorph: Morphological Analyser and Generator a rule-based approach, used nominal and verbal paradigms to write rules using a Finite-State Transducer mostly handles inflectional morphology paradigms: for verbal paradigms: used Graul’s paradigm19 collected verb roots from various sources, primarily from Irākavaiyaṅkār 20 conjugational forms are obtained from various sources, including from Crea21 auxiliary forms were taken from Lehmann22 at present: there are 3300+ base forms and 300+ conjugations for each base generated 1.4M+ simple and 50M+ complex surface forms23 19 K. Graul,Outline of Tamil grammar. Leipzip University, 1855 20 M. Irākavaiyaṅkār,’Viaittiripu viḷakkam’ (conjugation of Tamil verbs) (in Tamil). Eighty year anniversary publication, 1958. 21 E. Annamalai and Crea Team, A handbook of Tamil Verbal Conjugations, MCNeil Technologies, 2009 22 Lehmann, Thomas. 1993.A Grammar of Modern Tamil. Pondicherry Institute of Linguistics and Culture, India. 23 https://www.kaggle.com/sarves/tamilverbs K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 15 / 21
  • 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LFG-based grammar for Tamil Lexical Functional Grammar a constraint-based grammar, a generative grammar24 goal of combining linguistic sophistication with computational implementability primarily has a constituency and functional structures; now also extended to capture more complex analysis, like semantics, prosody etc. constituency structure (c-structure) - captures surface structure, word order etc. functional structure (f-structure) - captures the functions, constraints, argument structure etc. at present: it is developed based on 150 sentences taken from ParGram project25 and Grade-1 Tamil text book used ThamizhiMorph to generate lexicon available here: https://clarino.uib.no/iness/xle-web 24Kaplan, R.M. and Bresnan, J., 1981. Lexical-functional grammar: A formal system for grammatical representation. Mas- sachusetts Institute Of Technology, Center For Cognitive Science. 25Butt, Miriam, Tracy Holloway King, Maria-Eugenia Nino, and Frederique Segond. 1999. A Grammar Writer’s Cookbook. Stanford: CSLI Publications. K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 16 / 21
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LFG parsing - examples K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 17 / 21
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UD-based grammar for Tamil used a hybrid approach to develop the parser created UD annotated treebank, using ThamizhiPOSt, ThamizhiMorph and by hand iteratively trained the parser using machine learning approach also tried multilingual learning, along with Telugu and Hindi training a parser is a structured process, as below: K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 18 / 21
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creation of Treebanks Tamil MWTT: (Together with Prof. Prameswari, CALTS) Tamil Modern Written Tamil Treebank, used 536 sentences from a book called ”Grammar of Modern Tamil” - by Thoman Lehmann Manually (mostly) annotated dependency information available in UD repository26; work in progress Tamil ThamizhiTB: annotated 1300 sentences taken from online sources (some what balanced, taken from different type of sources), used hybrid approach (Human + Machine) different syntactical constructions are considered 26 https://github.com/UniversalDependencies/UD_Tamil-MWTT/tree/master K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 19 / 21
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance at present: have a parser, ThamizhiUDp, with the accuracy of 79% covers simple structures, except questions available through ThamizhiLIP Also tried, multilingual training with Hindi and Telugu. Multilingual learning is a technique used when there are less data. Dataset LAS (F1 score) Hindi27 (1500 sentences) 76.74 Telugu28 (1050 sentences) 75.73 27 https://github.com/UniversalDependencies/UD_Hindi-HDTB/tree/master 28 https://github.com/UniversalDependencies/UD_Telugu-MTG/tree/master K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 20 / 21
  • 21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finally: initial and usable versions of POS tagger, Morphological analyser/generator, and Dependency parsers are available publicly rule-based LFG parser and machine learning based UD parsers are useful devices for linguistic and computational analysis of our languages need more data to improve these tools need a lot more linguistic help everything open source for others to build upon; please make use of them conducting a workshop on UD treebank annotation on 8-10 April, 2021. Thank you. K. Sarveswaran (Sarves) iamsarves@gmail.com K. Sarveswaran (iamsarves@gmail.com) Tamil Dependency Parser March 27, 2021 21 / 21