What can typological knowledge bases and language representations tell us about linguistic properties?

Typ-NLP Workshop
1 August 2019
What can typological
knowledge bases and
language representations
tell us about linguistic
properties?
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*Credit for many of the slides: Johannes Bjerva

Linguistic Typology
2
● ‘The systematic study and comparison of language
structures’ (Velupillai, 2012)
● Long history (Herder, 1772; von der Gabelentz, 1891; …)
● Computational approaches (Dunn et al., 2011; Wälchli,
2014; Östling, 2015, ...)

Why Computational Typology?
3
● Answer linguistic research questions on large scale
● About relationships between languages
● About relationships between structural features of languages
● Facilitate multilingual learning
○ Cross-lingual transfer
○ Few-shot or zero-shot learning

How to Obtain Typological Knowledge?
4
● Discrete representation of language features in typological
knowledge bases
● World Atlas of Language Structures (WALS)
● Continuous representation of language features via
language embeddings
● Learned via language modelling

Why Computational Typology?
5
● Answer linguistic research questions on large scale
● Multilingual learning
○ Language representations
○ Cross-lingual transfer
○ Few-shot or zero-shot learning
● This talk:
○ Features in the World Atlas of Language Structures (WALS)
○ Computational Typology via unsupervised modelling of languages
in neural networks

6
World Atlas of Language Structures (WALS)

7
World Atlas of Language Structures (WALS)

Can language representations be learned from data?
Resources that exist for many languages:
● Universal Dependencies (>60 languages)
● UniMorph (>50 languages)
● New Testament translations (>1,000 languages)
● Automated Similarity Judgment Program (>4,500
languages)
8

Multilingual NLP and Language Representations
● No explicit representation
○ Multilingual Word Embeddings
● Google’s “Enabling zero-shot
learning” NMT trick
○ Language given explicitly in
input
● One-hot encodings
○ Languages represented as a
sparse vector
● Language Embeddings
○ Languages represented as a
distributed vector
9
(Östling and Tiedemann, 2017)

Experimental Setup
Data
● Pre-trained language embeddings (Östling and Tiedemann, 2017)
○ Trained via Language Modelling on New Testament data
● PoS annotation from Universal Dependencies for
○ Finnish
○ Estonian
○ North Sami
○ Hungarian
Task
● Fine-tune language embeddings on PoS tagging
● Investigate how typological properties are encoded in these for four
Uralic languages
10

Distributed Language Representations
11
• Language Embeddings
• Analogous to Word
Embeddings
• Can be learned in a
neural network without
supervision

Language Embeddings in Deep Neural Networks
12

Talk Overview
Part 1: Language Embeddings
- Do they aid multilingual parameter sharing?
- Do they encode typological properties?
- What types of similarities between languages do they encode?
Part 2: Typological Knowledge Bases
- Can they be populated automatically?
- Can they be used to discover typological implications?
13

Part 1: Language
Embeddings
14

Parameter sharing between dependency
parsers for related languages
Miryam de Lhoneux, Johannes Bjerva,
Isabelle Augenstein, Anders Søgaard
EMNLP 2018
15

Cross-lingual sharing with language embeddings
● Do language embeddings help to learn soft sharing
strategies?
● Use case: transition-based dependency parsing
● Types of parameters:
● Character embeddings
● Word embeddings
● Transition parameters (MLP)
● Ablation with language embedding concatenated with
char, word or transition vector
16

17
Lang Tokens Family Word order
ar 208,932 Semitic VSO
he 161,685 Semitic SVO
et 60,393 Finnic SVO
fi 67,258 Finnic SVO
hr 109,965 Slavic SVO
ru 90,170 Slavic SVO
it 113,825 Romance SVO
es 154,844 Romance SVO
nl 75,796 Germanic No dom. order
no 76,622 Germanic SVO
Table 1: Dataset characteristics
classifier parameters always helps, whereas the
usefulness of sharing LSTM parameters depends
transition is used to gene
pendency trees (Nivre, 200
For an input sentence o
w1, . . . , wn, the parser cre
tors x1:n, where the vecto
the concatenation of a wor
nal state of the character-b
cessing the characters of w
ch(wi) is obtained by run
LSTM over the character
of wi. Each input elemen
word-level, bi-directional L
BILSTM(x1:n, i). For each
ture extractor concatenates
tions of core elements fro
Both the embeddings and
together with the model.
A configuration c is re

19
78
78,2
78,4
78,6
78,8
79
79,2
79,4
79,6
79,8
80
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings

Related vs. Unrelated Languages
22
78,00
78,20
78,40
78,60
78,80
79,00
79,20
79,40
79,60
79,80
80,00
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings

Tracking Typological Traits of Uralic
Languages in Distributed Language
Representations
Johannes Bjerva, Isabelle Augenstein
IWCLUL 2018
24

25
1. Do language
embeddings aid
multilingual modelling?
2. Do language
embeddings contain
typological
information?

Model performance (Monolingual PoS tagging)
26
• Compared to most
frequent class
baseline (black line)
• Model transfer
between Finnic
languages relatively
successful
• Little effect from
language
embeddings (to be
expected)

Model performance (Multilingual PoS tagging)
27
• Compared to
monolingual baseline
(black line)
• Model transfer
between Finnic
languages
outperforms
monolingual baseline
• Language
embeddings improve
multilingual modelling

Tracking Typological Traits (full language sample)
28
• Baseline: Most frequent
typological class in sample
• Language embeddings saved
at each training epoch
• Separate Logistic Regression
classifier trained for each
feature and epoch
• Input: Language
embedding
• Output: Typological class
• Typological features encoded
in language embeddings
change during training

Tracking Typological Traits (Uralic languages held out)
29
• Some typological
features can be
predicted with high
accuracy for the
unseen Uralic
languages.

Cross-lingual sharing with language embeddings: Summary
● Conclusions
● Sharing high-level features more useful than low-level features
● If languages are unrelated, sharing low-level features hurts
performance
● Language embeddings help
● Only tested for (selected) language pairs
● Sharing for more languages
● Only tested for selected tasks (parsing, PoS tagging)
● Language embeddings pre-trained or trained end-to-end
● Soft or hard sharing based on typological KBs?
30

From Phonology to Syntax: Unsupervised
Linguistic Typology at Different Levels with
Language Embeddings
Johannes Bjerva, Isabelle Augenstein
NAACL HLT 2018
31

32
Do language
embeddings contain
typological information?
- Predict typological
features
- Study unsupervised vs.
fine-tuned embeddings

Research Questions
● RQ 1: Which typological properties are encoded in task-
specific distributed language representations, and can we
predict phonological, morphological and syntactic
properties of languages using such representations?
● RQ 2: To what extent do the encoded properties change as
the representations are fine-tuned for tasks at different
linguistic levels?
● RQ 3: How are language similarities encoded in fine-tuned
language embeddings?
33

Phonological Features
34
● 20 features
● E.g. descriptions of the
consonant and vowel
inventories, presence of tone
and stress markers

Morphological Features
35
● 41 features
● Features from morphological
and nominal chapter
● E.g. number of genders, usage
of definite and indefinite articles
and reduplication

Word Order Features
36
● 56 features
● Encode ordering of subjects,
objects and verbs

Experimental Setup
37
Data
● Pre-trained language embeddings (Östling and Tiedemann, 2017)
● Task-specific datasets: grapheme-to-phoneme (G2P), phonological
reconstruction (ASJP), morphological inflection (SIGMORPHON),
part-of-speech tagging (UD)
Dataset Class Ltask Ltask Lpre
G2P Phonology 311 102
ASJP Phonology 4664 824
SIGMORPHON Morphology 52 29
UD Syntax 50 27

Experimental Setup
38
Method
● Fine-tune language embeddings on grapheme-to-phoneme (G2P),
phonological reconstruction (ASJP), morphological inflection
(SIGMORPHON), part-of-speech tagging (UD)
○ train supervised seq2seq models on G2P, ASJP, SIGMORPHON
tasks and
○ Train seq labelling model on UD task
● Predict typological properties with kNN model

Experimental Setup
39
Seq2Seq Model

Part-of-Speech Tagging (UD)
47
- Improvements for all experimental settings
-> Pre-trained and fine-tuned language embeddings encode
features relevant to word order
System/Features Random lang/feat
pairs from word
order features
Random
lang/feat pairs
from all features
Most frequent class 67.81% 82.93%
k-NN (pre-trained) 76.66% 82.69%
k-NN (fine-tuned) *80.81% 83.55%

Conclusions
50
- Language embeddings can encode typological features
- Works for morphological inflection and PoS tagging
- Does not work for phonological tasks
- We can predict typological features for unseen language families
with high accuracies
- G2P task: phonological differences between otherwise similar
languages (e.g. Norwegian Bokmål and Danish) are accurately
encoded

What do Language Representations Really
Represent?
Johannes Bjerva, Robert Östling, Maria Han
Veiga, Jörg Tiedemann, Isabelle Augenstein
Computational Linguistics 2019
51

Language Representations encode Language Similarities
52
• Similar languages – similar representations
• ...similar how?
• Can reconstruct language family trees (Rabinovich et al. 2017)!
• So... Language family (genetic) similarity?

What Do Language Representations Really Represent?
53
presentations encapsulate is further hinted at
f language vectors in Östling and Tiedemann
models (Johnson et al. 2017).
Structural distance?
Family distance? 
Geographical distance?
{en
fr
es
pt
de
nl
Figure 1
Language representations in a
two-dimensional space. What do
their similarities represent?
prelimi-
anguage
n (2017),
Johnson
odelling
ng mul-
p on the
er (2017)
consist-
ﬁnd that
space is
However,
ng a cor-
Language embeddings in a two-dimensional space.
What do their similarities represent?

Language Representations from Monolingual Texts
54
• Input: official translations from EU languages to English (EuroParl)
• Train multilingual LM on various levels of abstraction
• Evaluate resulting language representations
Bjerva et al. What do Language Representations Really Represent?
Czech source
Swedish source
Official
translation
… …
Multilingual language model
CS For example , in my country , the Czech Republic English translation
CS ADP NOUN PUNCT ADP ADJ NOUN PUNCT DET PROPN PROPN POS
CS prep pobj punct prep poss pobj punct det compound nsubj DepRel
SE In Stockholm , we must make comparisons and learn English translation
SE ADP PROPN PUNCT PRON VERB VERB NOUN CCONJ VERB POS
SE prep pobj punct nsubj aux ROOT dobj cc conj DepRel
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
(2017) who investigate representation learning on
monolingual English sentences, which are translations
from various source languages to English from
the Europarl corpus (Koehn, 2005). They employ
a feature-engineering approach to predict source
languages and learn an Indo-European (IE) family
tree using their language representations. Crucially,
they posit that the relationships found between their
representations encode the genetic relationships
between languages. They use features based on
sequences of POS tags, function words and cohesive
markers. We significantly expand on this work by
comparing three language similarity measures (§4).
By doing this, we offer a stronger explanation of what
language representations really represent.
3 Method
Figure 2 illustrates the data and problem we consider
in this paper. We are given a set of English gold-
standard translations from the official languages of
the European Union, based on speeches from the
European Parliament.1 We wish to learn language
representations based on this data, and investigate the
linguistic relationships which hold between the result-
ing representations (RQ2). For this to make sense, it
is important to abstract away from the surface forms of
the translations as, e.g., speakers from certain regions
will tend to talk about the same issues. We therefore
introduce several levels of abstraction: i) training on
of POS tags. Our model is similar to Östling and
Tiedemann (2017), who train a character-based
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas Östling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
4.1 Genetic Distance
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
us source languages to English from
l corpus (Koehn, 2005). They employ
ngineering approach to predict source
nd learn an Indo-European (IE) family
heir language representations. Crucially,
hat the relationships found between their
ons encode the genetic relationships
nguages. They use features based on
f POS tags, function words and cohesive
We significantly expand on this work by
three language similarity measures (§4).
s, we offer a stronger explanation of what
presentations really represent.
d
strates the data and problem we consider
r. We are given a set of English gold-
nslations from the official languages of
an Union, based on speeches from the
arliament.1 We wish to learn language
ons based on this data, and investigate the
ationships which hold between the result-
tations (RQ2). For this to make sense, it
to abstract away from the surface forms of
ons as, e.g., speakers from certain regions
talk about the same issues. We therefore
LPOS Lraw LDepRel
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
most closely related to Rabinovich et al.
nvestigate representation learning on
English sentences, which are translations
source languages to English from
corpus (Koehn, 2005). They employ
ineering approach to predict source
d learn an Indo-European (IE) family
ir language representations. Crucially,
t the relationships found between their
ns encode the genetic relationships
uages. They use features based on
POS tags, function words and cohesive
significantly expand on this work by
ree language similarity measures (§4).
we offer a stronger explanation of what
esentations really represent.
rates the data and problem we consider
We are given a set of English gold-
slations from the official languages of
Union, based on speeches from the
liament.1 We wish to learn language
s based on this data, and investigate the
ionships which hold between the result-
tions (RQ2). For this to make sense, it
abstract away from the surface forms of
s as, e.g., speakers from certain regions
the input sequences themselves are, e.g., sequences
of POS tags. Our model is similar to Östling and
Tiedemann (2017), who train a character-based
LPOS Lraw LDepRel
Figure 2
Problem illustration. Given official translations from EU languages to English, we train
multilingual language models on various levels of abstractions, encoding the source languages.
The resulting source language representations (Lraw etc.) are evaluated.
languages, having an incorrect view of the structure of the language representation
space can be dangerous. For instance, the standard assumption of genetic similarity
would imply that the representation of the Gagauz language (Turkic, spoken mainly
in Moldova) should be interpolated from the genetically very close Turkish, but this
would likely lead to poor performance in syntactic tasks since the two languages have

Tree Distance Evaluation
55
• Hierarchical clustering of language embeddings
• Compare resulting trees with gold phylogenetic trees
• Hierarchical clustering of cosine distances (Rabinovich et al. 2017)
nd Petroni (2008), using the distance metric from Rabinovich, Ordan,
017).3
Our generated trees yield comparable results to previous work
Condition Mean St.d.
Raw text (LM-Raw) 0.527 -
Function words and POS (LM-Func) 0.556 -
Only POS (LM-POS) 0.517 -
Phrase-structure (LM-Phrase) 0.361 -
Dependency Relations (LM-Deprel) 0.321 -
POS trigrams (ROW17) 0.353 0.06
Random (ROW17) 0.724 0.07
Table 1
Tree distance evaluation (lower is better, cf. §5.1).
ling using lexi-
and POS tags.
ments deal with
y on the raw
This is likely to
tions by speak-
erent countries
peciﬁc issues or
Figure 2), and
l comparatively
n to work with
explicit syntac-
available. As a
the lack of explicit syntactic information, it is unsurprising that the
w in Table 1) only marginally outperform the random baseline.
away from the content and negate the geographical effect we train
on only function words and POS. This performs almost on par with
Func in Table 1), indicating that the level of abstraction reached is not
pture similarities between languages. We next investigate whether we
y abstract away from the content by removing function words, and only

56
Distance Measures
• Family distance (following Rabinovich et al. 2017)
• Geographic distance
• Using Glottlog geocoordinates (Hammarström et al. 2017)
• Structural distance
vs

• Language embedding similarities most strongly correlate with
structural similarities
• Less strong correlation with genetic similarities, even though
phylogenetic trees can be faithfully reconstructed (Rabinovich et al.
2017)
57
Figure 4
Correlations between similarities (Genetic, Geo
and Struct.) and language representations (Raw
Func, POS, Phrase, Deprel). Signiﬁcance at
p < 0.001 is indicated by *.
reconstruct
s in a sim-
work, we
genetic re-
ages really
esentations
matrices A⇢,
resents the
th
and jth
similarity
hen, the en-
ise genetic
mming the
Analysis of Similarities

Part 2: Typological
Knowledge Bases
58

A Probabilistic Generative Model of Linguistic
Typology
Johannes Bjerva, Yova Kementchedjhieva,
Ryan Cotterell, Isabelle Augenstein
NAACL 2019
59

Languages are different
61
eat an apple

Languages are different
62
eat an apple
ringo-wo taberu

Differences between languages are often correlated
63
eat an apple ringo-wo taberu
go to NAACL

Differences are often correlated
64
eat an apple ringo-wo taberu
go to NAACL

Differences are often correlated
65
eat an apple
go to NAACL
NAACL -ni iku
NAACL
ringo-wo taberu

Linguistic Typology and Greenberg’s Universals
66
VO languages have prepositions
OV languages have postpositions

Correlations between Word Order and Adpositions
67

Contributions
• Greenberg’s universals are binary – However, correlations are
rarely 100%, so we implement a probabilisation of typology
• Framed as typological collaborative filtering,
exploiting correlations between languages and features
• We exploit raw linguistic data by adding a semi-supervised
extension
68

World Atlas of Language Structure (WALS)
70
Ø 2,500 languages
Ø 192 features
Feature 81A – Order of Subject, Object and Verb

WALS is Sparse and Skewed
71
Ø Sparse:
Most languages are
covered by only a
handful of features
Ø Skewed:
A few features have
much wider
coverage than
others

World Atlas of Language Structure (WALS)
72
Ø 2,500 languages
800 languages
Ø 192 features
160 features
Feature 81A – Order of Subject, Object and Verb

Probabilistic Typology as Matrix Factorisation
73
Users
Movies

74
Users
Users
Movies
Ratings

75
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
Movies

76
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[ 4 1 -5]
[ 7 -2 0]
[ 6 -2 3]
[-9 1 4]
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[

Matrix Completion: Collaborative Filtering
77
[1,-4,3] [-5,2,1]
-10
Dot Product

78
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
[ 4 1 -5]
[ 7 -2 0]
[ 6 -2 3]
[-9 1 4]
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[

79
Users
-37 29 19 29
-36 67 77 22
-24 61 74 12
-79 -41-52 -39
-20
-5
70
-9
29
-21
14
-3
Users
[ 4 1 -5]
[ 7 -2 0]
[ 6 -2 3]
[-9 1 4]
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[

80
Languages
Typological Features

84
Typological Features

86

87

Typological Feature Prediction
88

89
4
0.9
Dot Product
Sigmoid

Typological Feature Prediction
90

Representing Languages – Semi-supervised
Extension
95
k a t
<s> k a

Representing Languages – Semi-supervised
Extension
96
k a t
<s> k a

Semi-supervised Extension - Interpretability
97
Typological Feature
Prediction
Multilingual Language
Modelling
Compressing linguistic information

Evaluation
• Controlling for
Genetic Relationships
• Train on all out-of-family
data
• [0, 1, 5, 10, 20]% in-family
data
• Observed features in
matrix
• With/without pre-
trained language
embeddings
98

Feature Prediction across Language Families
99

Effect of Pre-training (Semi-supervised Extension)
100

Uncovering Probabilistic Implications in
Typological Knowledge Bases
Johannes Bjerva, Yova Kementchedjhieva,
Ryan Cotterell, Isabelle Augenstein
ACL 2019
108

Linguistic Typology and Greenberg’s Universals
109
VO languages have prepositions
OV languages have postpositions

From Correlations to Probabilistic Implications
110
Visualisation of a section of the induced graphical model.
Observing the features in the left-most nodes (SV, OV, and
Noun-Adjective), can we correctly infer the value of the
right-most node (SVO)?
Computer Science, Johns Hopkins University
er Science and Technology, University of Cambridge
genstein@di.ku.dk, rdc42@cam.ac.uk
rooted in
linguistic
uages with
have post-
ations typi-
manual pro-
d linguists,
stic univer-
present a
sfully iden-
Greenberg
ones, wor-
n. Our ap-
ously used
g baseline
SV
OV
SVO
Noun-
Adjective
Figure 1: Visualisation of a section of our induced
graphical model. Observing the features in the left-
most nodes (SV, OV, and Noun-Adjective), can we cor-
rectly infer the value of the right-most node (SVO)?
one should consider universals to maintain the plau-
sibility of the data (Wang and Eisner, 2016). Com-

Accuracies for feature prediction in a typologically diverse
test set, across number of implicants used
111
ory sizes across languages
and Haspelmath (2013)).
s is a standard technique
we are interested in pre-
res given others. If we
observed features for a
N implicants 2 3 4 5 6
Phonology 0.75 0.82 0.84 0.86 0.89
Morphology 0.77 0.85 0.87 0.70 0.82
Nominal Categories 0.72 0.83 0.80 0.84 0.81
Nominal Syntax 0.77 0.89 0.85 0.89 0.81
Verbal Categories 0.80 0.84 0.80 0.86 0.90
Word Order 0.74 0.86 0.86 0.86 0.93
Clause 0.75 0.81 0.84 0.85 0.84
Complex 0.82 0.83 0.87 0.93 0.84
Lexical 0.83 0.76 0.75 0.85 0.79
Mean 0.77 0.83 0.83 0.85 0.85
Most freq. 0.30
Pairwise 0.77
PRA 0.81
Language embeddings 0.85
Table 1: Accuracies for feature prediction in a typo-
logically diverse test set, across number of implicants
used. Note that the numbers are not comparable across
columns nor to the baseline, since each makes a differ-
ent number of predictions.
Feature Prediction Accuracies

Hand-picked implications. In cases where the same is
covered by Daumé III and Campbell (2007), we borrow
their analysis (marked with *)
112
# Implicant Implicand
1* Postpositions Genitive-Noun (Greenberg #2a)
2* Postpositions OV (Greenberg #4)
3 OV SV
4* Postpositions SV
5* Prepositions VO (Greenberg #4)
6* Prepositions Initial subord. word (Lehmann)
7* Adjective-Noun, Postpositions Demonstrative-Noun
8* Genitive-Noun, Adjective-Noun OV
9 SV
OV
Noun-Adjective SOV
10 Degree word-Adjective
VO and Noun–Relative Clause
SVO Numeral-Noun
11 SOV
OV and Relative Clause–Noun
Adjective-Degree word Noun-Numeral
Table 2: Hand-picked implications. In cases where the
same is covered by Daum´e III and Campbell (2007),
we borrow their analysis (marked with *).
two-tailed t-test.2 After adjusting for multiple tests
power goes to the Degree
conditioned on this featur
der holds in 79% of the
combination of all three
hand, results in a subset
Numeral-Noun order. T
can thus be implied with
dence from a combination
5 Related Work
Typological implication
possible languages, base
served languages, as reco
guists (Greenberg, 1963; L
1983). While work in this
ual, typological knowled
(Dryer and Haspelmath,
Levin, 2016), which allow
Probabilistic Implications Found

Conclusions: This Talk
114
Part 1: Language Representations
- Improve performance for multilingual
sharing (de Lhoneaux et al. 2018)
- Encode typological properties
- task-specific fine-tuned ones even
more so than ones only obtained using
language modelling (Bjerva &
Augenstein 2018a,b)
- Can be used to reconstruct phylogenetic
trees (Rabinovich et al. 2017, Östling &
Tiedemann 2017)
- … but actually mostly represent
structural similarities between
languages (Bjerva et al., 2019a)

Conclusions: This Talk
115
Part 2: Typological
Knowledge Bases
- Can be populated automatically
with high accuracy using KBP
methods (Bjerva et al. 2019b)
- Language embeddings further
improve performance
- Can be used to discover
probabilistic implications
- With one or multiple implicants
- Including Greenberg universals

Future work
116
• Improve
multilingual
modelling
• E.g., share
morphologically
relevant
parameters for
morphologically
similar languages
• Use typological
knowledge bases
for multilingual
modelling

Presented Papers
Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard.
Parameter sharing between dependency parsers for related languages. EMNLP
2018.
Johannes Bjerva, Isabelle Augenstein. Tracking Typological Traits of Uralic
Languages in Distributed Language Representations. Fourth International
Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2018).
Johannes Bjerva, Isabelle Augenstein. Unsupervised Linguistic Typology at
Different Levels with Language Embeddings. NAACL HLT 2018.
Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle
Augenstein. What do Language Representations Really Represent?
Computational Linguistics, Vol. 45, No. 2, June 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
A Probabilistic Generative Model of Linguistic Typology. NAACL 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
Uncovering Probabilistic Implications in Typological Knowledge Bases. ACL 2019.
117

Thanks to my collaborators and advisees!
Johannes Bjerva, Ryan Cotterell, Yova Kementchedjhieva, Miryam de
Lhoneux, Robert Östling, Maria Han Veiga, Jörg Tiedemann
118

Thank you!
augenstein@di.ku.dk
@IAugenstein
119

What can typological knowledge bases and language representations tell us about linguistic properties?

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to What can typological knowledge bases and language representations tell us about linguistic properties?

Similar to What can typological knowledge bases and language representations tell us about linguistic properties? (20)

More from Isabelle Augenstein

More from Isabelle Augenstein (20)

Recently uploaded

Recently uploaded (20)

What can typological knowledge bases and language representations tell us about linguistic properties?