SlideShare a Scribd company logo
Standards of Tools for NLP in Danish
Leon Strømberg Derczynski

NLP Research Group

IT University of Copenhagen
The landscape of Danish LT
• Flat

• Featureless

• Yndigt
Language resources: datasets
Danish Faroese
West
Greenlandic
East
Greenlandic
North
Greenlandic
English Swedish
Fiction Yes - UD Yes - UD Yes - UD
News Yes - UD Yes - UD Yes - UD
Nonfiction Yes - UD Yes - UD Yes - UD
Spoken Yes - UD Yes - UD Yes - UD
Wikipedia 239.715
Yes - UD;

12.788
1.657 0 0
Yes - UD;

5.708.356
Yes - UD;

3.771.701
Reviews Independe
nt
Yes - UD
Spoken,
email, blog,
social
media,
academic,
legal,
essays,
fiction,
learner, web
Yes - UD
Margaret Hamilton, Apollo project lead coder, with

more text than we have for most of these languages
Language resources: tools
• DKIE - a plugin for GATE, using Stanford
CoreNLP

• tokenization

• PoS tagging

• NER

• daner, dapipe

• Wrapped tools using UD resources

• Free from ITU (http://nlp.itu.dk)

• VISL

• Grammar, ML for some pairs

• Pretty good resource - but now old (bit
rot?)

• Closed source

• http://visl.sdu.dk
Fig. 1: Kamelåså Syggelekokle
Language resources: standards
• PAROLE (FP3)

• TEI (2001-)

• CLARIN (2012-)

• CST actually have some pretty big
resources, handy for domain adaptation

• Conversions to partial UD

• DDT (2004; PAROLE format) 

• CDT (2009; Discontinuous Grammar)
Exploitable tech
• Sentiment ❌

• IE 

• Triples ❌

• NER ❌

• Stance ❌

• Events 

• Frames ❌

• Who-did-what-to-whom ❌

• Legal

• Compliance ❌

• Discovery ❌

• Clinical

• IE ❌

• Events ❌

• MH ❌

• Social media ❌
“12,3 milliard er vi ned”
Danish LT at ITU
• Hub for Danish language technology

• Four faculty

• Zeljko Agic:

• multilinguality, representations

• Leon Derczynski

• NLPL project, clinical, stance, social media

• Barbara Plank

• Fundamental processing tools

• Natalie Schluter

• Decoding algorithms in NLP (espec. parsing and
summarisation)

• Learning algorithms for NLP

• UD Treebanking for Danish (hopefully Faroese), with
extensions

• …and looking for collaborators!

• Theory of Hacks for the Machine Learning
Practitioner 

• Related: Deep Learning for NLP

• Resources: dapipe, daner

• nlp.itu.dk
ITU NLP - @NLPatITU
Funding situation
• DFF

• FTP: “This is too linguistic-y”

• ..FTP: “We don’t really fund CS anyway”

• FKK: “This is too computer-y”

• FNU: “We’re full”

• EC

• Dropped LT as specific funding category
I’m not funded well enough to caption this yet
Solutions: public education
• Inform the population

• People not familiar with NLP

• Ground up approach way to formal

• Top-down: introduce tools and their effects

• Political analysis, clinical analysis, business
analysis

• Give up on local funding for basic NLP research

• Until the local population is as familiar with
the tech as anglophone populations are

• What do we need to do to get there?
Solutions: build resources
• Scrape more

• Use CLARIN more

• Publish your damn data

• (everybody else is)

• (it’s 2018 and the world has changed again)

• Put resources on GitHub / Figshare

• No matter how trivial - e.g. much in NLPL is
near-free to build, but not previously available

• Build equivalents of English LRs

• Directory of LRs for Danish languages?

• Maybe just re-use (or start using) LRE Map
Fig. 2: “træls”
Solutions: standards
• Consider: task, and format

• Meaning representation: AMR; Frames; entity linking

• Parsing: discontinuous grammar; UD;
Stanford; ..PTB-like

• Morphology: UD; Danish-specific

• Translation: Word-aligned; comparable corpora

• Semantics: ISO standards; CLEF; interoperability
(ISA workshops); SRL

• Coreference: SemEval; MUC; Stanford deps

• NLG (MT, summ): multiple summaries/target
examples
Sabine Kirchmeier, direktør for Dansk Sprognævne
Solutions: funding routes
• Through the back door: application oriented

• Political analysis: stance

• Business intelligence: sentiment, IE (NER &
triples)

• Clinical: semantic extraction, parsing, IE
(temporal)

• Architecture and city planning: spatial
processing, IE (NER)

• Arctic collaboration: Faroese, Greenlandic
Let’s start

More Related Content

What's hot

Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
Steffen Staab
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
Nurfadhlina Mohd Sharef
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
Oscar Corcho
 
General Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open DataGeneral Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open Data
National Institute of Informatics (NII)
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
Gema Ramirez-Sanchez
 
sw owl
 sw owl sw owl
Open sonar martinreynaert
Open sonar martinreynaertOpen sonar martinreynaert
Open sonar martinreynaert
CLARIAH
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
Lucidworks
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
Pedro Szekely
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
Marina Santini
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
Melanie Courtot
 

What's hot (12)

Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
General Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open DataGeneral Introduction for Semantic Web and Linked Open Data
General Introduction for Semantic Web and Linked Open Data
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
sw owl
 sw owl sw owl
sw owl
 
Open sonar martinreynaert
Open sonar martinreynaertOpen sonar martinreynaert
Open sonar martinreynaert
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 

Similar to State of Tools for NLP in Danish: 2018

Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
LIBER Europe
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
Agile Offsharing: Using Pair Work to OvercomeNearshoring DifficultiesAgile Offsharing: Using Pair Work to OvercomeNearshoring Difficulties
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
MobileSolutionsDTAG
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
Chris Rusbridge
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
buivantan_uneti
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
Prompsit Language Engineering
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Project
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Research
elli.m
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
IMPACT Centre of Competence
 
OWN-PT: Taking Stock
OWN-PT: Taking Stock OWN-PT: Taking Stock
OWN-PT: Taking Stock
Valeria de Paiva
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
botsplash.com
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
ananth
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Victor de Boer
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
Basis Technology
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Machine Learning Prague
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
Steffen Staab
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 

Similar to State of Tools for NLP in Danish: 2018 (20)

Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
Agile Offsharing: Using Pair Work to OvercomeNearshoring DifficultiesAgile Offsharing: Using Pair Work to OvercomeNearshoring Difficulties
Agile Offsharing: Using Pair Work to Overcome Nearshoring Difficulties
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Research
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
OWN-PT: Taking Stock
OWN-PT: Taking Stock OWN-PT: Taking Stock
OWN-PT: Taking Stock
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 

More from Leon Derczynski

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
Leon Derczynski
 
RumourEval
RumourEvalRumourEval
RumourEval
Leon Derczynski
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Leon Derczynski
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
Leon Derczynski
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
Leon Derczynski
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
Leon Derczynski
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Leon Derczynski
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Leon Derczynski
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
Leon Derczynski
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
Leon Derczynski
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
Leon Derczynski
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Leon Derczynski
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Leon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
Leon Derczynski
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
Leon Derczynski
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
Leon Derczynski
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
Leon Derczynski
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
Leon Derczynski
 

More from Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
 

Recently uploaded

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

State of Tools for NLP in Danish: 2018

  • 1. Standards of Tools for NLP in Danish Leon Strømberg Derczynski NLP Research Group IT University of Copenhagen
  • 2. The landscape of Danish LT • Flat • Featureless • Yndigt
  • 3. Language resources: datasets Danish Faroese West Greenlandic East Greenlandic North Greenlandic English Swedish Fiction Yes - UD Yes - UD Yes - UD News Yes - UD Yes - UD Yes - UD Nonfiction Yes - UD Yes - UD Yes - UD Spoken Yes - UD Yes - UD Yes - UD Wikipedia 239.715 Yes - UD;
 12.788 1.657 0 0 Yes - UD;
 5.708.356 Yes - UD;
 3.771.701 Reviews Independe nt Yes - UD Spoken, email, blog, social media, academic, legal, essays, fiction, learner, web Yes - UD Margaret Hamilton, Apollo project lead coder, with
 more text than we have for most of these languages
  • 4. Language resources: tools • DKIE - a plugin for GATE, using Stanford CoreNLP • tokenization • PoS tagging • NER • daner, dapipe • Wrapped tools using UD resources • Free from ITU (http://nlp.itu.dk) • VISL • Grammar, ML for some pairs • Pretty good resource - but now old (bit rot?) • Closed source • http://visl.sdu.dk Fig. 1: Kamelåså Syggelekokle
  • 5. Language resources: standards • PAROLE (FP3) • TEI (2001-) • CLARIN (2012-) • CST actually have some pretty big resources, handy for domain adaptation • Conversions to partial UD • DDT (2004; PAROLE format) • CDT (2009; Discontinuous Grammar)
  • 6. Exploitable tech • Sentiment ❌ • IE • Triples ❌ • NER ❌ • Stance ❌ • Events • Frames ❌ • Who-did-what-to-whom ❌ • Legal • Compliance ❌ • Discovery ❌ • Clinical • IE ❌ • Events ❌ • MH ❌ • Social media ❌ “12,3 milliard er vi ned”
  • 7. Danish LT at ITU • Hub for Danish language technology • Four faculty • Zeljko Agic: • multilinguality, representations • Leon Derczynski • NLPL project, clinical, stance, social media • Barbara Plank • Fundamental processing tools • Natalie Schluter • Decoding algorithms in NLP (espec. parsing and summarisation) • Learning algorithms for NLP • UD Treebanking for Danish (hopefully Faroese), with extensions • …and looking for collaborators! • Theory of Hacks for the Machine Learning Practitioner • Related: Deep Learning for NLP • Resources: dapipe, daner • nlp.itu.dk ITU NLP - @NLPatITU
  • 8. Funding situation • DFF • FTP: “This is too linguistic-y” • ..FTP: “We don’t really fund CS anyway” • FKK: “This is too computer-y” • FNU: “We’re full” • EC • Dropped LT as specific funding category I’m not funded well enough to caption this yet
  • 9. Solutions: public education • Inform the population • People not familiar with NLP • Ground up approach way to formal • Top-down: introduce tools and their effects • Political analysis, clinical analysis, business analysis • Give up on local funding for basic NLP research • Until the local population is as familiar with the tech as anglophone populations are • What do we need to do to get there?
  • 10. Solutions: build resources • Scrape more • Use CLARIN more • Publish your damn data • (everybody else is) • (it’s 2018 and the world has changed again) • Put resources on GitHub / Figshare • No matter how trivial - e.g. much in NLPL is near-free to build, but not previously available • Build equivalents of English LRs • Directory of LRs for Danish languages? • Maybe just re-use (or start using) LRE Map Fig. 2: “træls”
  • 11. Solutions: standards • Consider: task, and format • Meaning representation: AMR; Frames; entity linking • Parsing: discontinuous grammar; UD; Stanford; ..PTB-like • Morphology: UD; Danish-specific • Translation: Word-aligned; comparable corpora • Semantics: ISO standards; CLEF; interoperability (ISA workshops); SRL • Coreference: SemEval; MUC; Stanford deps • NLG (MT, summ): multiple summaries/target examples Sabine Kirchmeier, direktør for Dansk Sprognævne
  • 12. Solutions: funding routes • Through the back door: application oriented • Political analysis: stance • Business intelligence: sentiment, IE (NER & triples) • Clinical: semantic extraction, parsing, IE (temporal) • Architecture and city planning: spatial processing, IE (NER) • Arctic collaboration: Faroese, Greenlandic