Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What can typological knowledge bases and language representations tell us about linguistic properties?

658 views

Published on

One of the core challenges in typology is to record properties of languages in a structured way. As a result of manual efforts, typological knowledge bases have emerged, which contains information about languages’ phonological, morphological and syntactic properties; as well as information about language families. Ideally, such typological knowledge bases would provide useful information for multilingual NLP models to learn how to selectively share parameters.
A related area of research suggests a different way of encoding properties of languages, namely to learn language representation vectors directly from text documents.
In this talk, I will analyse and contrast these two ways of encoding linguistic properties, as well as present research on how the two can benefit one another.

Published in: Science
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

What can typological knowledge bases and language representations tell us about linguistic properties?

  1. 1. Typ-NLP Workshop 1 August 2019 What can typological knowledge bases and language representations tell us about linguistic properties? Isabelle Augenstein* augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ *Credit for many of the slides: Johannes Bjerva
  2. 2. Linguistic Typology 2 ● ‘The systematic study and comparison of language structures’ (Velupillai, 2012) ● Long history (Herder, 1772; von der Gabelentz, 1891; …) ● Computational approaches (Dunn et al., 2011; Wälchli, 2014; Östling, 2015, ...)
  3. 3. Why Computational Typology? 3 ● Answer linguistic research questions on large scale ● About relationships between languages ● About relationships between structural features of languages ● Facilitate multilingual learning ○ Cross-lingual transfer ○ Few-shot or zero-shot learning
  4. 4. How to Obtain Typological Knowledge? 4 ● Discrete representation of language features in typological knowledge bases ● World Atlas of Language Structures (WALS) ● Continuous representation of language features via language embeddings ● Learned via language modelling
  5. 5. Why Computational Typology? 5 ● Answer linguistic research questions on large scale ● Multilingual learning ○ Language representations ○ Cross-lingual transfer ○ Few-shot or zero-shot learning ● This talk: ○ Features in the World Atlas of Language Structures (WALS) ○ Computational Typology via unsupervised modelling of languages in neural networks
  6. 6. 6 World Atlas of Language Structures (WALS)
  7. 7. 7 World Atlas of Language Structures (WALS)
  8. 8. Can language representations be learned from data? Resources that exist for many languages: ● Universal Dependencies (>60 languages) ● UniMorph (>50 languages) ● New Testament translations (>1,000 languages) ● Automated Similarity Judgment Program (>4,500 languages) 8
  9. 9. Multilingual NLP and Language Representations ● No explicit representation ○ Multilingual Word Embeddings ● Google’s “Enabling zero-shot learning” NMT trick ○ Language given explicitly in input ● One-hot encodings ○ Languages represented as a sparse vector ● Language Embeddings ○ Languages represented as a distributed vector 9 (Östling and Tiedemann, 2017)
  10. 10. Experimental Setup Data ● Pre-trained language embeddings (Östling and Tiedemann, 2017) ○ Trained via Language Modelling on New Testament data ● PoS annotation from Universal Dependencies for ○ Finnish ○ Estonian ○ North Sami ○ Hungarian Task ● Fine-tune language embeddings on PoS tagging ● Investigate how typological properties are encoded in these for four Uralic languages 10
  11. 11. Distributed Language Representations 11 • Language Embeddings • Analogous to Word Embeddings • Can be learned in a neural network without supervision
  12. 12. Language Embeddings in Deep Neural Networks 12
  13. 13. Talk Overview Part 1: Language Embeddings - Do they aid multilingual parameter sharing? - Do they encode typological properties? - What types of similarities between languages do they encode? Part 2: Typological Knowledge Bases - Can they be populated automatically? - Can they be used to discover typological implications? 13
  14. 14. Part 1: Language Embeddings 14
  15. 15. Parameter sharing between dependency parsers for related languages Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard EMNLP 2018 15
  16. 16. Cross-lingual sharing with language embeddings ● Do language embeddings help to learn soft sharing strategies? ● Use case: transition-based dependency parsing ● Types of parameters: ● Character embeddings ● Word embeddings ● Transition parameters (MLP) ● Ablation with language embedding concatenated with char, word or transition vector 16
  17. 17. Cross-lingual sharing with language embeddings 17 Lang Tokens Family Word order ar 208,932 Semitic VSO he 161,685 Semitic SVO et 60,393 Finnic SVO fi 67,258 Finnic SVO hr 109,965 Slavic SVO ru 90,170 Slavic SVO it 113,825 Romance SVO es 154,844 Romance SVO nl 75,796 Germanic No dom. order no 76,622 Germanic SVO Table 1: Dataset characteristics classifier parameters always helps, whereas the usefulness of sharing LSTM parameters depends transition is used to gene pendency trees (Nivre, 200 For an input sentence o w1, . . . , wn, the parser cre tors x1:n, where the vecto the concatenation of a wor nal state of the character-b cessing the characters of w ch(wi) is obtained by run LSTM over the character of wi. Each input elemen word-level, bi-directional L BILSTM(x1:n, i). For each ture extractor concatenates tions of core elements fro Both the embeddings and together with the model. A configuration c is re
  18. 18. Cross-lingual sharing with language embeddings 19 78 78,2 78,4 78,6 78,8 79 79,2 79,4 79,6 79,8 80 AVG Mono Lang-Best Best All Soft • Mono: single-task baseline • Lang-best: best sharing strategy for each language • Best: best sharing strategy across languages (char not shared, word shared, transition shared with language embedding) • All: all parameters shared • Soft: sharing learned using language embeddings
  19. 19. Related vs. Unrelated Languages 22 78,00 78,20 78,40 78,60 78,80 79,00 79,20 79,40 79,60 79,80 80,00 AVG Mono Lang-Best Best All Soft • Mono: single-task baseline • Lang-best: best sharing strategy for each language • Best: best sharing strategy across languages (char not shared, word shared, transition shared with language embedding) • All: all parameters shared • Soft: sharing learned using language embeddings
  20. 20. Tracking Typological Traits of Uralic Languages in Distributed Language Representations Johannes Bjerva, Isabelle Augenstein IWCLUL 2018 24
  21. 21. Language Embeddings in Deep Neural Networks 25 1. Do language embeddings aid multilingual modelling? 2. Do language embeddings contain typological information?
  22. 22. Model performance (Monolingual PoS tagging) 26 • Compared to most frequent class baseline (black line) • Model transfer between Finnic languages relatively successful • Little effect from language embeddings (to be expected)
  23. 23. Model performance (Multilingual PoS tagging) 27 • Compared to monolingual baseline (black line) • Model transfer between Finnic languages outperforms monolingual baseline • Language embeddings improve multilingual modelling
  24. 24. Tracking Typological Traits (full language sample) 28 • Baseline: Most frequent typological class in sample • Language embeddings saved at each training epoch • Separate Logistic Regression classifier trained for each feature and epoch • Input: Language embedding • Output: Typological class • Typological features encoded in language embeddings change during training
  25. 25. Tracking Typological Traits (Uralic languages held out) 29 • Some typological features can be predicted with high accuracy for the unseen Uralic languages.
  26. 26. Cross-lingual sharing with language embeddings: Summary ● Conclusions ● Sharing high-level features more useful than low-level features ● If languages are unrelated, sharing low-level features hurts performance ● Language embeddings help ● Only tested for (selected) language pairs ● Sharing for more languages ● Only tested for selected tasks (parsing, PoS tagging) ● Language embeddings pre-trained or trained end-to-end ● Soft or hard sharing based on typological KBs? 30
  27. 27. From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings Johannes Bjerva, Isabelle Augenstein NAACL HLT 2018 31
  28. 28. Language Embeddings in Deep Neural Networks 32 Do language embeddings contain typological information? - Predict typological features - Study unsupervised vs. fine-tuned embeddings
  29. 29. Research Questions ● RQ 1: Which typological properties are encoded in task- specific distributed language representations, and can we predict phonological, morphological and syntactic properties of languages using such representations? ● RQ 2: To what extent do the encoded properties change as the representations are fine-tuned for tasks at different linguistic levels? ● RQ 3: How are language similarities encoded in fine-tuned language embeddings? 33
  30. 30. Phonological Features 34 ● 20 features ● E.g. descriptions of the consonant and vowel inventories, presence of tone and stress markers
  31. 31. Morphological Features 35 ● 41 features ● Features from morphological and nominal chapter ● E.g. number of genders, usage of definite and indefinite articles and reduplication
  32. 32. Word Order Features 36 ● 56 features ● Encode ordering of subjects, objects and verbs
  33. 33. Experimental Setup 37 Data ● Pre-trained language embeddings (Östling and Tiedemann, 2017) ● Task-specific datasets: grapheme-to-phoneme (G2P), phonological reconstruction (ASJP), morphological inflection (SIGMORPHON), part-of-speech tagging (UD) Dataset Class Ltask Ltask Lpre G2P Phonology 311 102 ASJP Phonology 4664 824 SIGMORPHON Morphology 52 29 UD Syntax 50 27
  34. 34. Experimental Setup 38 Method ● Fine-tune language embeddings on grapheme-to-phoneme (G2P), phonological reconstruction (ASJP), morphological inflection (SIGMORPHON), part-of-speech tagging (UD) ○ train supervised seq2seq models on G2P, ASJP, SIGMORPHON tasks and ○ Train seq labelling model on UD task ● Predict typological properties with kNN model
  35. 35. Experimental Setup 39 Seq2Seq Model
  36. 36. Part-of-Speech Tagging (UD) 47 - Improvements for all experimental settings -> Pre-trained and fine-tuned language embeddings encode features relevant to word order System/Features Random lang/feat pairs from word order features Random lang/feat pairs from all features Most frequent class 67.81% 82.93% k-NN (pre-trained) 76.66% 82.69% k-NN (fine-tuned) *80.81% 83.55%
  37. 37. Conclusions 50 - Language embeddings can encode typological features - Works for morphological inflection and PoS tagging - Does not work for phonological tasks - We can predict typological features for unseen language families with high accuracies - G2P task: phonological differences between otherwise similar languages (e.g. Norwegian Bokmål and Danish) are accurately encoded
  38. 38. What do Language Representations Really Represent? Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein Computational Linguistics 2019 51
  39. 39. Language Representations encode Language Similarities 52 • Similar languages – similar representations • ...similar how? • Can reconstruct language family trees (Rabinovich et al. 2017)! • So... Language family (genetic) similarity?
  40. 40. What Do Language Representations Really Represent? 53 presentations encapsulate is further hinted at f language vectors in Östling and Tiedemann models (Johnson et al. 2017). Structural distance? Family distance?
 Geographical distance? {en fr es pt de nl Figure 1 Language representations in a two-dimensional space. What do their similarities represent? prelimi- anguage n (2017), Johnson odelling ng mul- p on the er (2017) consist- find that space is However, ng a cor- Language embeddings in a two-dimensional space. What do their similarities represent?
  41. 41. Language Representations from Monolingual Texts 54 • Input: official translations from EU languages to English (EuroParl) • Train multilingual LM on various levels of abstraction • Evaluate resulting language representations Bjerva et al. What do Language Representations Really Represent? Czech source Swedish source Official translation … … Multilingual language model Multilingual language model Multilingual language model CS For example , in my country , the Czech Republic English translation CS ADP NOUN PUNCT ADP ADJ NOUN PUNCT DET PROPN PROPN POS CS prep pobj punct prep poss pobj punct det compound nsubj DepRel SE In Stockholm , we must make comparisons and learn English translation SE ADP PROPN PUNCT PRON VERB VERB NOUN CCONJ VERB POS SE prep pobj punct nsubj aux ROOT dobj cc conj DepRel 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 (2017) who investigate representation learning on monolingual English sentences, which are translations from various source languages to English from the Europarl corpus (Koehn, 2005). They employ a feature-engineering approach to predict source languages and learn an Indo-European (IE) family tree using their language representations. Crucially, they posit that the relationships found between their representations encode the genetic relationships between languages. They use features based on sequences of POS tags, function words and cohesive markers. We significantly expand on this work by comparing three language similarity measures (§4). By doing this, we offer a stronger explanation of what language representations really represent. 3 Method Figure 2 illustrates the data and problem we consider in this paper. We are given a set of English gold- standard translations from the official languages of the European Union, based on speeches from the European Parliament.1 We wish to learn language representations based on this data, and investigate the linguistic relationships which hold between the result- ing representations (RQ2). For this to make sense, it is important to abstract away from the surface forms of the translations as, e.g., speakers from certain regions will tend to talk about the same issues. We therefore introduce several levels of abstraction: i) training on of POS tags. Our model is similar to ¨Ostling and Tiedemann (2017), who train a character-based multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). 4.1 Genetic Distance 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 us source languages to English from l corpus (Koehn, 2005). They employ ngineering approach to predict source nd learn an Indo-European (IE) family heir language representations. Crucially, hat the relationships found between their ons encode the genetic relationships nguages. They use features based on f POS tags, function words and cohesive We significantly expand on this work by three language similarity measures (§4). s, we offer a stronger explanation of what presentations really represent. d strates the data and problem we consider r. We are given a set of English gold- nslations from the official languages of an Union, based on speeches from the arliament.1 We wish to learn language ons based on this data, and investigate the ationships which hold between the result- tations (RQ2). For this to make sense, it to abstract away from the surface forms of ons as, e.g., speakers from certain regions talk about the same issues. We therefore multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 most closely related to Rabinovich et al. nvestigate representation learning on English sentences, which are translations source languages to English from corpus (Koehn, 2005). They employ ineering approach to predict source d learn an Indo-European (IE) family ir language representations. Crucially, t the relationships found between their ns encode the genetic relationships uages. They use features based on POS tags, function words and cohesive significantly expand on this work by ree language similarity measures (§4). we offer a stronger explanation of what esentations really represent. rates the data and problem we consider We are given a set of English gold- slations from the official languages of Union, based on speeches from the liament.1 We wish to learn language s based on this data, and investigate the ionships which hold between the result- tions (RQ2). For this to make sense, it abstract away from the surface forms of s as, e.g., speakers from certain regions the input sequences themselves are, e.g., sequences of POS tags. Our model is similar to ¨Ostling and Tiedemann (2017), who train a character-based multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). Figure 2 Problem illustration. Given official translations from EU languages to English, we train multilingual language models on various levels of abstractions, encoding the source languages. The resulting source language representations (Lraw etc.) are evaluated. languages, having an incorrect view of the structure of the language representation space can be dangerous. For instance, the standard assumption of genetic similarity would imply that the representation of the Gagauz language (Turkic, spoken mainly in Moldova) should be interpolated from the genetically very close Turkish, but this would likely lead to poor performance in syntactic tasks since the two languages have
  42. 42. Tree Distance Evaluation 55 • Hierarchical clustering of language embeddings • Compare resulting trees with gold phylogenetic trees • Hierarchical clustering of cosine distances (Rabinovich et al. 2017) nd Petroni (2008), using the distance metric from Rabinovich, Ordan, 017).3 Our generated trees yield comparable results to previous work Condition Mean St.d. Raw text (LM-Raw) 0.527 - Function words and POS (LM-Func) 0.556 - Only POS (LM-POS) 0.517 - Phrase-structure (LM-Phrase) 0.361 - Dependency Relations (LM-Deprel) 0.321 - POS trigrams (ROW17) 0.353 0.06 Random (ROW17) 0.724 0.07 Table 1 Tree distance evaluation (lower is better, cf. §5.1). ling using lexi- and POS tags. ments deal with y on the raw This is likely to tions by speak- erent countries pecific issues or Figure 2), and l comparatively n to work with explicit syntac- available. As a the lack of explicit syntactic information, it is unsurprising that the w in Table 1) only marginally outperform the random baseline. away from the content and negate the geographical effect we train on only function words and POS. This performs almost on par with Func in Table 1), indicating that the level of abstraction reached is not pture similarities between languages. We next investigate whether we y abstract away from the content by removing function words, and only
  43. 43. 56 Distance Measures • Family distance (following Rabinovich et al. 2017) • Geographic distance • Using Glottlog geocoordinates (Hammarström et al. 2017) • Structural distance vs
  44. 44. • Language embedding similarities most strongly correlate with structural similarities • Less strong correlation with genetic similarities, even though phylogenetic trees can be faithfully reconstructed (Rabinovich et al. 2017) 57 Figure 4 Correlations between similarities (Genetic, Geo and Struct.) and language representations (Raw Func, POS, Phrase, Deprel). Significance at p < 0.001 is indicated by *. reconstruct s in a sim- work, we genetic re- ages really esentations matrices A⇢, resents the th and jth similarity hen, the en- ise genetic mming the Analysis of Similarities
  45. 45. Part 2: Typological Knowledge Bases 58
  46. 46. A Probabilistic Generative Model of Linguistic Typology Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein NAACL 2019 59
  47. 47. Languages are different 61 eat an apple
  48. 48. Languages are different 62 eat an apple ringo-wo taberu
  49. 49. Differences between languages are often correlated 63 eat an apple ringo-wo taberu go to NAACL
  50. 50. Differences are often correlated 64 eat an apple ringo-wo taberu go to NAACL
  51. 51. Differences are often correlated 65 eat an apple go to NAACL NAACL -ni iku NAACL ringo-wo taberu
  52. 52. Linguistic Typology and Greenberg’s Universals 66 VO languages have prepositions OV languages have postpositions
  53. 53. Correlations between Word Order and Adpositions 67
  54. 54. Contributions • Greenberg’s universals are binary – However, correlations are rarely 100%, so we implement a probabilisation of typology • Framed as typological collaborative filtering, exploiting correlations between languages and features • We exploit raw linguistic data by adding a semi-supervised extension 68
  55. 55. World Atlas of Language Structure (WALS) 70 Ø 2,500 languages Ø 192 features Feature 81A – Order of Subject, Object and Verb
  56. 56. WALS is Sparse and Skewed 71 Ø Sparse: Most languages are covered by only a handful of features Ø Skewed: A few features have much wider coverage than others
  57. 57. World Atlas of Language Structure (WALS) 72 Ø 2,500 languages 800 languages Ø 192 features 160 features Feature 81A – Order of Subject, Object and Verb
  58. 58. Probabilistic Typology as Matrix Factorisation 73 Users Movies
  59. 59. Probabilistic Typology as Matrix Factorisation 74 Users Users Movies Ratings
  60. 60. Probabilistic Typology as Matrix Factorisation 75 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users Movies
  61. 61. Probabilistic Typology as Matrix Factorisation 76 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  62. 62. Matrix Completion: Collaborative Filtering 77 [1,-4,3] [-5,2,1] -10 Dot Product
  63. 63. Probabilistic Typology as Matrix Factorisation 78 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  64. 64. Probabilistic Typology as Matrix Factorisation 79 Users -37 29 19 29 -36 67 77 22 -24 61 74 12 -79 -41-52 -39 -20 -5 70 -9 29 -21 14 -3 Users [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  65. 65. Probabilistic Typology as Matrix Factorisation 80 Languages Typological Features
  66. 66. Probabilistic Typology as Matrix Factorisation 84 Typological Features
  67. 67. Probabilistic Typology as Matrix Factorisation 86
  68. 68. Probabilistic Typology as Matrix Factorisation 87
  69. 69. Typological Feature Prediction 88
  70. 70. Probabilistic Typology as Matrix Factorisation 89 4 0.9 Dot Product Sigmoid
  71. 71. Typological Feature Prediction 90
  72. 72. Representing Languages – Semi-supervised Extension 95 k a t <s> k a
  73. 73. Representing Languages – Semi-supervised Extension 96 k a t <s> k a
  74. 74. Semi-supervised Extension - Interpretability 97 Typological Feature Prediction Multilingual Language Modelling Compressing linguistic information
  75. 75. Evaluation • Controlling for Genetic Relationships • Train on all out-of-family data • [0, 1, 5, 10, 20]% in-family data • Observed features in matrix • With/without pre- trained language embeddings 98
  76. 76. Feature Prediction across Language Families 99
  77. 77. Effect of Pre-training (Semi-supervised Extension) 100
  78. 78. Uncovering Probabilistic Implications in Typological Knowledge Bases Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein ACL 2019 108
  79. 79. Linguistic Typology and Greenberg’s Universals 109 VO languages have prepositions OV languages have postpositions
  80. 80. From Correlations to Probabilistic Implications 110 Visualisation of a section of the induced graphical model. Observing the features in the left-most nodes (SV, OV, and Noun-Adjective), can we correctly infer the value of the right-most node (SVO)? Computer Science, Johns Hopkins University er Science and Technology, University of Cambridge genstein@di.ku.dk, rdc42@cam.ac.uk rooted in linguistic uages with have post- ations typi- manual pro- d linguists, stic univer- present a sfully iden- Greenberg ones, wor- n. Our ap- ously used g baseline SV OV SVO Noun- Adjective Figure 1: Visualisation of a section of our induced graphical model. Observing the features in the left- most nodes (SV, OV, and Noun-Adjective), can we cor- rectly infer the value of the right-most node (SVO)? one should consider universals to maintain the plau- sibility of the data (Wang and Eisner, 2016). Com-
  81. 81. Accuracies for feature prediction in a typologically diverse test set, across number of implicants used 111 ory sizes across languages and Haspelmath (2013)). s is a standard technique we are interested in pre- res given others. If we observed features for a N implicants 2 3 4 5 6 Phonology 0.75 0.82 0.84 0.86 0.89 Morphology 0.77 0.85 0.87 0.70 0.82 Nominal Categories 0.72 0.83 0.80 0.84 0.81 Nominal Syntax 0.77 0.89 0.85 0.89 0.81 Verbal Categories 0.80 0.84 0.80 0.86 0.90 Word Order 0.74 0.86 0.86 0.86 0.93 Clause 0.75 0.81 0.84 0.85 0.84 Complex 0.82 0.83 0.87 0.93 0.84 Lexical 0.83 0.76 0.75 0.85 0.79 Mean 0.77 0.83 0.83 0.85 0.85 Most freq. 0.30 Pairwise 0.77 PRA 0.81 Language embeddings 0.85 Table 1: Accuracies for feature prediction in a typo- logically diverse test set, across number of implicants used. Note that the numbers are not comparable across columns nor to the baseline, since each makes a differ- ent number of predictions. Feature Prediction Accuracies
  82. 82. Hand-picked implications. In cases where the same is covered by Daumé III and Campbell (2007), we borrow their analysis (marked with *) 112 # Implicant Implicand 1* Postpositions Genitive-Noun (Greenberg #2a) 2* Postpositions OV (Greenberg #4) 3 OV SV 4* Postpositions SV 5* Prepositions VO (Greenberg #4) 6* Prepositions Initial subord. word (Lehmann) 7* Adjective-Noun, Postpositions Demonstrative-Noun 8* Genitive-Noun, Adjective-Noun OV 9 SV OV Noun-Adjective SOV 10 Degree word-Adjective VO and Noun–Relative Clause SVO Numeral-Noun 11 SOV OV and Relative Clause–Noun Adjective-Degree word Noun-Numeral Table 2: Hand-picked implications. In cases where the same is covered by Daum´e III and Campbell (2007), we borrow their analysis (marked with *). two-tailed t-test.2 After adjusting for multiple tests power goes to the Degree conditioned on this featur der holds in 79% of the combination of all three hand, results in a subset Numeral-Noun order. T can thus be implied with dence from a combination 5 Related Work Typological implication possible languages, base served languages, as reco guists (Greenberg, 1963; L 1983). While work in this ual, typological knowled (Dryer and Haspelmath, Levin, 2016), which allow Probabilistic Implications Found
  83. 83. Wrap-Up 113
  84. 84. Conclusions: This Talk 114 Part 1: Language Representations - Improve performance for multilingual sharing (de Lhoneaux et al. 2018) - Encode typological properties - task-specific fine-tuned ones even more so than ones only obtained using language modelling (Bjerva & Augenstein 2018a,b) - Can be used to reconstruct phylogenetic trees (Rabinovich et al. 2017, Östling & Tiedemann 2017) - … but actually mostly represent structural similarities between languages (Bjerva et al., 2019a)
  85. 85. Conclusions: This Talk 115 Part 2: Typological Knowledge Bases - Can be populated automatically with high accuracy using KBP methods (Bjerva et al. 2019b) - Language embeddings further improve performance - Can be used to discover probabilistic implications - With one or multiple implicants - Including Greenberg universals
  86. 86. Future work 116 • Improve multilingual modelling • E.g., share morphologically relevant parameters for morphologically similar languages • Use typological knowledge bases for multilingual modelling
  87. 87. Presented Papers Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard. Parameter sharing between dependency parsers for related languages. EMNLP 2018. Johannes Bjerva, Isabelle Augenstein. Tracking Typological Traits of Uralic Languages in Distributed Language Representations. Fourth International Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2018). Johannes Bjerva, Isabelle Augenstein. Unsupervised Linguistic Typology at Different Levels with Language Embeddings. NAACL HLT 2018. Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein. What do Language Representations Really Represent? Computational Linguistics, Vol. 45, No. 2, June 2019. Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein. A Probabilistic Generative Model of Linguistic Typology. NAACL 2019. Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein. Uncovering Probabilistic Implications in Typological Knowledge Bases. ACL 2019. 117
  88. 88. Thanks to my collaborators and advisees! Johannes Bjerva, Ryan Cotterell, Yova Kementchedjhieva, Miryam de Lhoneux, Robert Östling, Maria Han Veiga, Jörg Tiedemann 118
  89. 89. Thank you! augenstein@di.ku.dk @IAugenstein 119

×