State of Tools for NLP in Danish: 2018

Standards of Tools for NLP in Danish
Leon Strømberg Derczynski

NLP Research Group

IT University of Copenhagen

The landscape of Danish LT
• Flat

• Featureless

• Yndigt

Language resources: datasets
Danish Faroese
West
Greenlandic
East
Greenlandic
North
Greenlandic
English Swedish
Fiction Yes - UD Yes - UD Yes - UD
News Yes - UD Yes - UD Yes - UD
Nonﬁction Yes - UD Yes - UD Yes - UD
Spoken Yes - UD Yes - UD Yes - UD
Wikipedia 239.715
Yes - UD; 
12.788
1.657 0 0
Yes - UD; 
5.708.356
Yes - UD; 
3.771.701
Reviews Independe
nt
Yes - UD
Spoken,
email, blog,
social
media,
academic,
legal,
essays,
ﬁction,
learner, web
Yes - UD
Margaret Hamilton, Apollo project lead coder, with 
more text than we have for most of these languages

Language resources: tools
• DKIE - a plugin for GATE, using Stanford
CoreNLP

• tokenization

• PoS tagging

• NER

• daner, dapipe

• Wrapped tools using UD resources

• Free from ITU (http://nlp.itu.dk)

• VISL

• Grammar, ML for some pairs

• Pretty good resource - but now old (bit
rot?)

• Closed source

• http://visl.sdu.dk
Fig. 1: Kamelåså Syggelekokle

Language resources: standards
• PAROLE (FP3)

• TEI (2001-)

• CLARIN (2012-)

• CST actually have some pretty big
resources, handy for domain adaptation

• Conversions to partial UD

• DDT (2004; PAROLE format)

• CDT (2009; Discontinuous Grammar)

Exploitable tech
• Sentiment ❌

• IE

• Triples ❌

• NER ❌

• Stance ❌

• Events

• Frames ❌

• Who-did-what-to-whom ❌

• Legal

• Compliance ❌

• Discovery ❌

• Clinical

• IE ❌

• Events ❌

• MH ❌

• Social media ❌
“12,3 milliard er vi ned”

Danish LT at ITU
• Hub for Danish language technology

• Four faculty

• Zeljko Agic:

• multilinguality, representations

• Leon Derczynski

• NLPL project, clinical, stance, social media

• Barbara Plank

• Fundamental processing tools

• Natalie Schluter

• Decoding algorithms in NLP (espec. parsing and
summarisation)

• Learning algorithms for NLP

• UD Treebanking for Danish (hopefully Faroese), with
extensions

• …and looking for collaborators!

• Theory of Hacks for the Machine Learning
Practitioner

• Related: Deep Learning for NLP

• Resources: dapipe, daner

• nlp.itu.dk
ITU NLP - @NLPatITU

Funding situation
• DFF

• FTP: “This is too linguistic-y”

• ..FTP: “We don’t really fund CS anyway”

• FKK: “This is too computer-y”

• FNU: “We’re full”

• EC

• Dropped LT as speciﬁc funding category
I’m not funded well enough to caption this yet

Solutions: public education
• Inform the population

• People not familiar with NLP

• Ground up approach way to formal

• Top-down: introduce tools and their eﬀects

• Political analysis, clinical analysis, business
analysis

• Give up on local funding for basic NLP research

• Until the local population is as familiar with
the tech as anglophone populations are

• What do we need to do to get there?

Solutions: build resources
• Scrape more

• Use CLARIN more

• Publish your damn data

• (everybody else is)

• (it’s 2018 and the world has changed again)

• Put resources on GitHub / Figshare

• No matter how trivial - e.g. much in NLPL is
near-free to build, but not previously available

• Build equivalents of English LRs

• Directory of LRs for Danish languages?

• Maybe just re-use (or start using) LRE Map
Fig. 2: “træls”

Solutions: standards
• Consider: task, and format

• Meaning representation: AMR; Frames; entity linking

• Parsing: discontinuous grammar; UD;
Stanford; ..PTB-like

• Morphology: UD; Danish-speciﬁc

• Translation: Word-aligned; comparable corpora

• Semantics: ISO standards; CLEF; interoperability
(ISA workshops); SRL

• Coreference: SemEval; MUC; Stanford deps

• NLG (MT, summ): multiple summaries/target
examples
Sabine Kirchmeier, direktør for Dansk Sprognævne

Solutions: funding routes
• Through the back door: application oriented

• Political analysis: stance

• Business intelligence: sentiment, IE (NER &
triples)

• Clinical: semantic extraction, parsing, IE
(temporal)

• Architecture and city planning: spatial
processing, IE (NER)

• Arctic collaboration: Faroese, Greenlandic

State of Tools for NLP in Danish: 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to State of Tools for NLP in Danish: 2018

Similar to State of Tools for NLP in Danish: 2018 (20)

More from Leon Derczynski

More from Leon Derczynski (20)

Recently uploaded

Recently uploaded (20)

State of Tools for NLP in Danish: 2018