Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

383 views

Published on

Paper presented at NAACL 2018. Link: https://arxiv.org/abs/1802.09913

Abstract:
============
We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.

Published in: Technology
  • Be the first to comment

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

  1. 1. NAACL 2018 *equal contributions Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces Isabelle Augenstein*, Sebastian Ruder*, Anders Søgaard augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/
  2. 2. Problem 2 - Different NLU tasks (e.g. stance detection, aspect-based sentiment analysis, natural language inference) - Limited training data for most individual tasks - However: - they can be modelled with same base neural model - they are semantically related - they have similar labels - How to exploit synergies between those tasks?
  3. 3. Datasets and Tasks 3 Topic-based sentiment analysis (2-way, 5-way) negative, positive highly negative, negative, neutral, positive, highly positive Target-dependent sentiment analysis negative, neutral, positive Aspect-based sentiment analysis (Restaurants, Laptops) negative, neutral, positive Stance detection against, none, favor Fake news detection disagree, unrelated, discuss, agree Natural language inference contradiction, neutral, entailment
  4. 4. Datasets Examples 4 Aspect-based sentiment analysis (Restaurants) Text: For the price, you cannot eat this well in Manhattan Aspect: restaurant prices Label: positive Natural language inference Premise: Fun for only children Hypothesis: Fun for adults and children Label: contradiction
  5. 5. Multi-Task Learning 5
  6. 6. Multi-Task Learning 6 Separate inputs for each task
  7. 7. Multi-Task Learning 7 Shared hidden layers Separate inputs for each task
  8. 8. Multi-Task Learning 8 Shared hidden layers Separate inputs for each task Separate output layers + classification functions
  9. 9. Multi-Task Learning 9 Shared hidden layers Separate inputs for each task Separate output layers + classification functions Negative log- likelihood objectives
  10. 10. Goal: Exploiting Synergies between Tasks 10 - Modelling tasks in a joint label space - Label Transfer Network that learns to transfer labels between tasks - Use semi-supervised learning, trained end-to-end with multi-task learning model - Extensive evaluation on a set of pairwise sequence classification tasks
  11. 11. Related Work 11 - Learning task similarities - Clustering of labels - Inducing shared prior - Learning grouping - Only works for tasks with same label spaces - Label transformations - Distributional information - Correlation analysis - Typically prior to model training
  12. 12. Related Work 12 - Multi-task learning with neural networks - Hard parameter sharing - Different sharing structures - Does not take similarities between label spaces into account - Semi-supervised learning - Self-training - Tri-training - Co-forest - Assumes same label spaces
  13. 13. Model 1: Label Embedding Layer 20/06/2016 13
  14. 14. Multi-Task Learning 14 Shared hidden layers Separate inputs for each task Separate output layers + classification functions Negative log- likelihood objectives
  15. 15. Label Embedding Layer 15
  16. 16. Label Embedding Layer 16 Label embedding space Prediction with label compatibility function: c(l, h) = l · h
  17. 17. Label Embeddings 17
  18. 18. Label Embeddings - Model relationships between labels in the joint embedding space - Might lead to downstream improvements - Crucial: the compatibility between an instance and any label can now be measured - This can be used to learn to transfer labels across datasets 18
  19. 19. Model 2: Label Transfer Network 20/06/2016 19
  20. 20. Label Transfer Network Goal: learn to produce pseudo labels for target task LTNT = MLP([o1, …, oT-1]) Li oi = ∑ pj Ti lj j=1 - Output label embedding oi of task Ti: sum of the task’s label embeddings lj weighted with their probability pj Ti - Trained on labelled target task data - Negative log-likelihood objective LLTN to produce a pseudo-label for the target task 20
  21. 21. Model 3: Semi-Supervised MTL 20/06/2016 21
  22. 22. Semi-Supervised MTL Goal: relabel aux task data as main task data using LTN - LTN can be used to produce pseudo-labels for aux or unlabelled instances - Train the target task model on additional pseudo- labelled data, added iteratively - Additional loss: minimise the mean squared error between the model predictions pTi and pseudo-labels zTi produced by LTN 22
  23. 23. 23 Best-Performing Aux Tasks Main task Aux task Topic-2 FNC-1, MultiNLI, Target Topic-5 FNC-1, MultiNLI, ABSA-L, Target Target FNC-1, MultiNLI, Topic-5 Stance FNC-1, MultiNLI, Target ABSA-L Topic-5 ABSA-R Topic-5, ABSA-L, Target FNC-1 Stance, MultiNLI, Topic-5, ABSA-R, Target MultiNLI Topic-5 Trends: • Target used by all Twitter main tasks • Tasks with a higher number of labels (e.g. Topic-5) are used more often • Tasks with more training data (FNC-1, MultiNLI) are used more often
  24. 24. Overall Results 24
  25. 25. Overall Results 25
  26. 26. Overall Results 26
  27. 27. Overall Results 27
  28. 28. Overall Results 28
  29. 29. Relabelling 29
  30. 30. Overall Results - MTL models outperform STL models - Label embeddings improve performance - New SoA on topic-based sentiment analysis - Learning to relabel data jointly with MTL tends to not improve performance further - Future work: use relabelling model to label unlabelled data instead 30
  31. 31. Thank you! augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ 20/06/2016 31
  32. 32. Datasets and Tasks Topic-based sentiment analysis: Tweet: No power at home, sat in the dark listening to AC/DC in the hope it’ll make the electricity come back again Topic: AC/DC Label: positive Target-dependent sentiment analysis: Text: how do you like settlers of catan for the wii? Target: wii Label: neutral Aspect-based sentiment analysis: Text: For the price, you cannot eat this well in Manhattan Aspects: restaurant prices, food quality Label: positive 32 Stance detection: Tweet: Be prepared - if we continue the policies of the liberal left, we will be #Greece Target: Donald Trump Label: favor Fake news detection: Document: Dino Ferrari hooked the whopper wels catfish, (...), which could be the biggest in the world. Headline: Fisherman lands 19 STONE catfish which could be the biggest in the world to be hooked Label: agree Natural language inference: Premise: Fun for only children Hypothesis: Fun for adults and children Label: contradiction
  33. 33. Label Transfer Network (w or w/o semi-supervision) 33
  34. 34. Label Embedding Layer 34 Label embedding space Prediction with label compatibility function c(·, ·) that measures similarity between label embedding l and hidden representation h: c(l, h) = l · h
  35. 35. Learning with Limited Labelled Data: Why? 35 General Challenges - Manually annotating training data is expensive - Only few large NLP datasets - New tasks and domains - Domain drift Multilingual and Diversity Aspects - Underrepresented languages - Dialects
  36. 36. Learning with Limited Labelled Data: How? 36 - Domain Adaptation - Weakly Supervised Learning - Distant Supervision - Transfer Learning - Multi-Task Learning - Unsupervised Learning

×