Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Emily Pitler - Representations from Natural Language Data: Successes and Challenges

238 views

Published on

Representations from Natural Language Data: Successes and Challenges

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Emily Pitler - Representations from Natural Language Data: Successes and Challenges

  1. 1. Successes and Challenges Emily Pitler, Google AI Representations from Natural Language Data
  2. 2. State-of-the-art in Natural Language Understanding in 2017 → → Custom Recurrent Architectures P 2
  3. 3. Oct. 2018: One Model with Task-specific Tuning in Minutes P 3
  4. 4. BERT: Bidirectional Encoder Representations from Transformers P 4
  5. 5. https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html Transformers: Attention is All You Need P 5Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, NIPS 2017
  6. 6. From One-Hot Vectors to Word Embeddings & Self-Attention P 6 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  7. 7. From One-Hot Vectors to Word Embeddings & Self-Attention P 7 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  8. 8. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 8 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  9. 9. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 9 animal...street...it 0000…10001…01000…0 one-hot 0.1 0.2 0.7 (self-)attention 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  10. 10. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 10 animal...street...it 0000…10001…01000…0 one-hot 0.1 0.2 0.7 (self-)attention 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  11. 11. OpenAI: Generative Pretraining The animal tired Acceptable <s> the … too <s> the … tired P 11 Transformer Transformer Transformer Transformer TransformerTransformer Transformer Transformer Transformer Transformer TransformerTransformer
  12. 12. Understanding Can Need “Future” Information How far is Jacksonville from Miami? Jacksonville is in the First Coast region of northeast Florida and is centered on the banks of the St. Johns River, about 25 miles (40 km) south of the Georgia state line and about 340 miles (550 km) north of Miami. VERB NOUN Mark which area you want to distress. Mark, which area do you want to distress? P 12
  13. 13. Naive Bidirectionality: Words Can “See Themselves” The animal tired The animal tired <s> the … too <s> the … too P 13 Transformer Transformer Transformer Transformer TransformerTransformer Transformer Transformer Transformer Transformer TransformerTransformer
  14. 14. Training BERT Masked Language Model (Fill-in-the-blank) Deep learning (also [MASK] [MASK] deep structured learning or [MASK] learning) is part of a broader family of machine learning methods [MASK] on [MASK] data representations, as opposed to task-specific algorithms. [MASK] is allergic to peaches. Is P 14https://en.wikipedia.org/wiki/Deep_learning https://en.wikipedia.org/wiki/Daniel_Tiger%27s_Neighborhood BooksCorpus: Zhu, Kiros, Zemel, Salakhutdinov, Urtasun, Torralba, Fidler, CVPR 2015
  15. 15. Basic BERT Recipe P 15
  16. 16. Basic BERT Recipe P 16
  17. 17. Basic BERT Recipe P 17
  18. 18. SWAG: Zellers, Bisk, Schwartz, Choi, EMNLP 2018 SQuAD: Rajpurkar, Zhang, Lopyrev, Liang, EMNLP 2016 Results: Commonsense Reasoning and Question Answering P 18
  19. 19. MRPC: Dolan and Brockett, IWP 2005 Pretraining Tasks Matter...and Bigger = Better * P 19
  20. 20. Do I Need Full BERT Models for All My Tasks? P 20Houlsby, Giurgiu, Jastrzebski, Morrone, de Laroussilhe, Gesmundo, Attariyan, Gelly, arxiv Feb 2019
  21. 21. Try It Out, Get Faster Training with TPUs P 21
  22. 22. Mismatches between Training and Realistic Inputs Two Case Studies: Mixed Language Text and Identifying Commands P 22
  23. 23. P 23 Multiple Languages: Frequent “In-the-Wild”, Rare in Training
  24. 24. P 24 Multiple Languages: Frequent “In-the-Wild”, Rare in Training
  25. 25. “A Fast, Compact, Accurate Model for Language Identification of Codemixed Text” Zhang, Riesa , Gillick , Bakalov, Baldridge, Weiss, EMNLP 2018 P 25
  26. 26. Accuracy and Speed of Token-level Language Id P 26
  27. 27. Accuracy and Speed of Token-level Language Id P 27
  28. 28. Useful Preprocessing Step Across Tasks P 28
  29. 29. Mismatches between Training and Realistic Inputs Two Case Studies: Mixed Language Text and Identifying Commands P 29
  30. 30. Noun-Verb Ambiguity “lives” / Noun → /laIvz/ “lives” / Verb → /lIvz/ flies NOUN Mark VERB P 30Elkahky, Webster, Andor, Pitler, EMNLP 2018
  31. 31. Certain insects can damage plumerias, such as mites, flies, or aphids. NOUN Mark which area you want to distress. VERB P 31 “A Challenge Set and Methods for Noun-Verb Ambiguity”, EMNLP 2018
  32. 32. Accuracy on Noun-Verb Disambiguation P 32
  33. 33. Pronunciation of Homographs Accuracy P 33
  34. 34. Mark VERB Webster, Recasens, Axelrod, Baldridge, TACL 2019 Kwiatkowski, Palomaki, Redfield, Collins, Parikh, Alberti, Epstein, Polosukhin, Kelcey, Devlin, Lee, Toutanova, Jones, Chang, Dai, Uszkoreit, Le, Petrov, TACL 2019 Released Datasets with “In-the-Wild” Natural Challenges P 34
  35. 35. Summary P 35

×