SlideShare a Scribd company logo
 About The paper
 There has been substantial recent interest in annotation schemes that can be applied consistently to
many languages
- The Universal Dependencies (UD) project seeks to introduce a cross-linguistically applicable part-
of-speech tagset, feature inventory, and set of dependency relations as well as a large number of
uniformly annotated treebank
 The paper(2015) said universal dependency for Finnish is one of ten language in recent first releas
of UD project treebank data.
 This paper details the mapping of previously introduce annotation to the UD standard
 This paper additionally presents parsing experiments.
- comparing the performance of a state-of-the-art parser trained on a language specific annotation schema to
performance on the corresponding UD annotation.
- Tool and resources mentioned in this paper are available under open licenses from http://bionlp.utu.fi/ud-finnish.html
 Conversion From Turku Dependence Treebank to Universal Dependencies
 What is the treebank(Wikipedia)?
- in linguistics, a treebank is a parsed text corpus that annotates syntactic and semantic sentences structure.
Most syntactic treebank
annotate variants of either
phrase structure(left) or
dependency structure(right)
 Universal dependency builds on the Google universal part-of-speech(POS) tagset, The interest interlingua of
morphosyntactic features, and Stanford Dependency
 Universal dependency defines also a treebank storage format, CoNLL-U.
 In this paper, they presents the adaptation of the UD guidelines to Finnish and the creation of the UD Finnish
treebank by conversion of the previously introduced Turku Dependency Treebank(TDT).
 They present the comparison of parsing score between language-specific treebank annotation to that of UD
annotation
Most syntactic treebank
annotate variants of either
phrase structure(left) or
dependency structure(right)
 The initial stage of the work is identifying similarities and difference between the TDT and UD
annotation guidelines, adapting the general UD guidelines and planning the implementation of the
conversion .
- Technically, the conversion is on pipeline of processing components, each of which consumed and produced
CoNLL-U formatted data.
 In this paper, TDT is the source for the conversion. It consists of 15,000 sentences(200,000 words)
draw from a variety of sources and annotated in a Finnish-specific version of the Stanford
Dependencies(SD) scheme.
 They also addressed several instances.
- tokenization differed from UD specifications.
- corrected a small number of sentence-splitting errors.
- updated the lemmas to improve both treebank-internal consistency and conformance with the UD specification
 They further introduced a fully manually annotated morphology layer, replacing the automatically
generated morphological annotation of the initial data.
 The Universal specification defines 17 POS tags, and requires that all conforming treebank use only
these tags
- While no language-specific POS tags can thus be defined in the primary POS annotation, the CoNLL-U format
allows a secondary POS tag to be assigned to each word to preserve treebank-specific information.
 In the case of conjunctions from subordinating conjunctions (CONJ and SCONJ in UD, respectively).
 Just four TDT tags, marking pronouns, punctuation, symbols and verbs, required further information
to resolve correctly.
 : TDT-style vs UD style for syntax and part-of-speech annotation for a Finnish sentence.
 The Universal Dependencies specification defines a set of 17 widely attested Morphological features
such as Case, Person, Number, Voice and Mood.
- However, by contrast to the POS tag annotation, the specification allows conforming treebanks to
introduce language specific features that are not included in this universal inventory.
 UD defines as set of 40 broadly applicable dependency relation, further allowing language-specific
subtypes of these to be defined to meet the needs of specifics resources.
- Unlike the fairly straightforward mapping from morphological annotation, the conversion from TDT dependency
annotation to UD often required not only relabeling types, abut also changes to the tree structure.
 The UD syntactic annotation is based on the universal Stanford Dependencies(SD) scheme.
 The total number of dependency relation types defined in UD Finnish is 43, consisting of 32 universal
relations and 11 language-specific sub-types.
 In the original TDD annotation, 46 dependency types are used, with an additional 4 types to mark non-
tree structures used in the second annotation layer of TDT.
 Null token for omitted element in sentence BUT In UD, it is not used. Instead In UD, remnant relation.
 TDT-style(top) vs UD-style(bottom)
 The figure below shows tokens statistics for the 10 languages for which treebanks were included in
the initial UD data release.
- with over 200,000 tokens, The UD Finnish treebank is in a mid-size cluster among the UD version 1 languages
together with German, English, and Italian.
 As baseline, we consider the most recent Finnish dependency distribution parser trained and
evaluated on the original distribution of TDT.
- they said that the conversion to UD has had no major effect on the parsing accuracy.
 Soft constraint approach is where tags given by marmot are preserved, and OMorFi is only used to
select the lemma.
 Hard-pos constraint is where the predicted POS can be found among the analyses given by OMoriFi
 They have presented Universal Dependencies(UD) for Finnish, detailing the application of general UD
guidelines to the annotation of parts-of-speech, morphological feature, and dependency relations in
Finish and introducing a conversion from the previously released TurKu Dependency Treebank corpus
into the UD Finnish treebank released in the first UD data release.
 UniversalDependencies.org
 The paper – Universal Dependencies for Finnish
Universal dependencies for Finnish

More Related Content

What's hot

Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Liwei Ren任力偉
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
Guy De Pauw
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
Lifeng (Aaron) Han
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
Jaya Kumari
 
Bt0078
Bt0078Bt0078
Bt0078
smumbahelp
 
GNU Internationalization Presentation
GNU Internationalization PresentationGNU Internationalization Presentation
GNU Internationalization Presentation
Joe Turner
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
Rohit Luthra
 
Textmining
TextminingTextmining
Textmining
sidhunileshwar
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
Edison Lascano
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
Bikash chhetri
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachChimezie Ogbuji
 
Zrm
ZrmZrm
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
Algoscale Technologies Inc.
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
RIILP
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
Jorge Baptista
 

What's hot (20)

Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
Unit 2
Unit 2Unit 2
Unit 2
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
 
Bt0078
Bt0078Bt0078
Bt0078
 
GNU Internationalization Presentation
GNU Internationalization PresentationGNU Internationalization Presentation
GNU Internationalization Presentation
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
 
Textmining
TextminingTextmining
Textmining
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
 
Normalization
NormalizationNormalization
Normalization
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
Zrm
ZrmZrm
Zrm
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
Xml2
Xml2Xml2
Xml2
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 

Similar to Universal dependencies for Finnish

Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
monnetproject
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
Christoph Lange
 
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Algoscale Technologies Inc.
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Christoph Lange
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Cognitum
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Christoph Lange
 
MOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala SwedenMOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala Sweden
Olga Caprotti
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
Mohit Jasapara
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
 
Customizable Segmentation of
Customizable Segmentation ofCustomizable Segmentation of
Customizable Segmentation ofAndi Wu
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And Ontology
Andries_vanRenssen
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003
Andries_vanRenssen
 
W17 5406
W17 5406W17 5406
W17 5406
bonbon93
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
gerogepatton
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
IJITE
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
ijrap
 
Lit mtap
Lit mtapLit mtap
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
ijnlc
 

Similar to Universal dependencies for Finnish (20)

Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
 
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Ali...
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditor
 
ijcai11
ijcai11ijcai11
ijcai11
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
 
MOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala SwedenMOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala Sweden
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
Customizable Segmentation of
Customizable Segmentation ofCustomizable Segmentation of
Customizable Segmentation of
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And Ontology
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
W17 5406
W17 5406W17 5406
W17 5406
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Lit mtap
Lit mtapLit mtap
Lit mtap
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 

More from hyunyoung Lee

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
hyunyoung Lee
 
(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining
hyunyoung Lee
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
hyunyoung Lee
 
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
hyunyoung Lee
 
(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors
hyunyoung Lee
 
(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons
hyunyoung Lee
 
(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...
hyunyoung Lee
 
Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...
hyunyoung Lee
 
Language grounding and never-ending language learning
Language grounding and never-ending language learningLanguage grounding and never-ending language learning
Language grounding and never-ending language learning
hyunyoung Lee
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
hyunyoung Lee
 
Spam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural networkSpam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural network
hyunyoung Lee
 
Word embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filteringWord embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filtering
hyunyoung Lee
 
Memory Networks
Memory NetworksMemory Networks
Memory Networks
hyunyoung Lee
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
hyunyoung Lee
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
hyunyoung Lee
 
Natural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usage
hyunyoung Lee
 
large-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identificationlarge-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identification
hyunyoung Lee
 
Word2Vec
Word2VecWord2Vec
Word2Vec
hyunyoung Lee
 
NLTK practice with nltk book
NLTK practice with nltk bookNLTK practice with nltk book
NLTK practice with nltk book
hyunyoung Lee
 
Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
hyunyoung Lee
 

More from hyunyoung Lee (20)

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART
 
(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining(Paper Seminar) Cross-lingual_language_model_pretraining
(Paper Seminar) Cross-lingual_language_model_pretraining
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
 
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
(Paper Seminar short version) BART: Denoising Sequence-to-Sequence Pre-traini...
 
(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors(Paper seminar)Learned in Translation: Contextualized Word Vectors
(Paper seminar)Learned in Translation: Contextualized Word Vectors
 
(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons(Paper seminar)Retrofitting word vector to semantic lexicons
(Paper seminar)Retrofitting word vector to semantic lexicons
 
(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...(Paper seminar)real-time personalization using embedding for search ranking a...
(Paper seminar)real-time personalization using embedding for search ranking a...
 
Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...Neural machine translation inspired binary code similarity comparison beyond ...
Neural machine translation inspired binary code similarity comparison beyond ...
 
Language grounding and never-ending language learning
Language grounding and never-ending language learningLanguage grounding and never-ending language learning
Language grounding and never-ending language learning
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
Spam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural networkSpam text message filtering by using sen2 vec and feedforward neural network
Spam text message filtering by using sen2 vec and feedforward neural network
 
Word embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filteringWord embedding method of sms messages for spam message filtering
Word embedding method of sms messages for spam message filtering
 
Memory Networks
Memory NetworksMemory Networks
Memory Networks
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
 
Natural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usage
 
large-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identificationlarge-scale and language-oblivious code authorship identification
large-scale and language-oblivious code authorship identification
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
NLTK practice with nltk book
NLTK practice with nltk bookNLTK practice with nltk book
NLTK practice with nltk book
 
Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 

Universal dependencies for Finnish

  • 1.
  • 2.
  • 3.  About The paper  There has been substantial recent interest in annotation schemes that can be applied consistently to many languages - The Universal Dependencies (UD) project seeks to introduce a cross-linguistically applicable part- of-speech tagset, feature inventory, and set of dependency relations as well as a large number of uniformly annotated treebank  The paper(2015) said universal dependency for Finnish is one of ten language in recent first releas of UD project treebank data.
  • 4.  This paper details the mapping of previously introduce annotation to the UD standard  This paper additionally presents parsing experiments. - comparing the performance of a state-of-the-art parser trained on a language specific annotation schema to performance on the corresponding UD annotation. - Tool and resources mentioned in this paper are available under open licenses from http://bionlp.utu.fi/ud-finnish.html  Conversion From Turku Dependence Treebank to Universal Dependencies  What is the treebank(Wikipedia)? - in linguistics, a treebank is a parsed text corpus that annotates syntactic and semantic sentences structure. Most syntactic treebank annotate variants of either phrase structure(left) or dependency structure(right)
  • 5.  Universal dependency builds on the Google universal part-of-speech(POS) tagset, The interest interlingua of morphosyntactic features, and Stanford Dependency  Universal dependency defines also a treebank storage format, CoNLL-U.  In this paper, they presents the adaptation of the UD guidelines to Finnish and the creation of the UD Finnish treebank by conversion of the previously introduced Turku Dependency Treebank(TDT).  They present the comparison of parsing score between language-specific treebank annotation to that of UD annotation Most syntactic treebank annotate variants of either phrase structure(left) or dependency structure(right)
  • 6.  The initial stage of the work is identifying similarities and difference between the TDT and UD annotation guidelines, adapting the general UD guidelines and planning the implementation of the conversion . - Technically, the conversion is on pipeline of processing components, each of which consumed and produced CoNLL-U formatted data.  In this paper, TDT is the source for the conversion. It consists of 15,000 sentences(200,000 words) draw from a variety of sources and annotated in a Finnish-specific version of the Stanford Dependencies(SD) scheme.  They also addressed several instances. - tokenization differed from UD specifications. - corrected a small number of sentence-splitting errors. - updated the lemmas to improve both treebank-internal consistency and conformance with the UD specification  They further introduced a fully manually annotated morphology layer, replacing the automatically generated morphological annotation of the initial data.
  • 7.  The Universal specification defines 17 POS tags, and requires that all conforming treebank use only these tags - While no language-specific POS tags can thus be defined in the primary POS annotation, the CoNLL-U format allows a secondary POS tag to be assigned to each word to preserve treebank-specific information.  In the case of conjunctions from subordinating conjunctions (CONJ and SCONJ in UD, respectively).  Just four TDT tags, marking pronouns, punctuation, symbols and verbs, required further information to resolve correctly.
  • 8.  : TDT-style vs UD style for syntax and part-of-speech annotation for a Finnish sentence.
  • 9.  The Universal Dependencies specification defines a set of 17 widely attested Morphological features such as Case, Person, Number, Voice and Mood. - However, by contrast to the POS tag annotation, the specification allows conforming treebanks to introduce language specific features that are not included in this universal inventory.
  • 10.  UD defines as set of 40 broadly applicable dependency relation, further allowing language-specific subtypes of these to be defined to meet the needs of specifics resources. - Unlike the fairly straightforward mapping from morphological annotation, the conversion from TDT dependency annotation to UD often required not only relabeling types, abut also changes to the tree structure.
  • 11.  The UD syntactic annotation is based on the universal Stanford Dependencies(SD) scheme.  The total number of dependency relation types defined in UD Finnish is 43, consisting of 32 universal relations and 11 language-specific sub-types.  In the original TDD annotation, 46 dependency types are used, with an additional 4 types to mark non- tree structures used in the second annotation layer of TDT.  Null token for omitted element in sentence BUT In UD, it is not used. Instead In UD, remnant relation.  TDT-style(top) vs UD-style(bottom)
  • 12.  The figure below shows tokens statistics for the 10 languages for which treebanks were included in the initial UD data release. - with over 200,000 tokens, The UD Finnish treebank is in a mid-size cluster among the UD version 1 languages together with German, English, and Italian.
  • 13.  As baseline, we consider the most recent Finnish dependency distribution parser trained and evaluated on the original distribution of TDT. - they said that the conversion to UD has had no major effect on the parsing accuracy.
  • 14.  Soft constraint approach is where tags given by marmot are preserved, and OMorFi is only used to select the lemma.  Hard-pos constraint is where the predicted POS can be found among the analyses given by OMoriFi
  • 15.  They have presented Universal Dependencies(UD) for Finnish, detailing the application of general UD guidelines to the annotation of parts-of-speech, morphological feature, and dependency relations in Finish and introducing a conversion from the previously released TurKu Dependency Treebank corpus into the UD Finnish treebank released in the first UD data release.
  • 16.  UniversalDependencies.org  The paper – Universal Dependencies for Finnish