SlideShare a Scribd company logo
SHOULD YOU BE AFRAID OF
TRANSFORMERS?
LECTURE DEEP LEARNING MEETUP, COLOGNE, 21.05.2019
DOMINIK SEISSER
Intro
• Over past year, string of deep learning innovation destroyed
previous state-of-the art NLP benchmarks
• We‘ll look how we got there, what future might look like and
what you can do with it
• Brief history of NLP deep learning with a small intermezzo on
ethics
aboutme
•3 Gartner Cool Vendor AI/Big Data Startups
•Gyana: Tech Lead/Head of AI, founded in Oxford,
„Top 10“ AI Startup in London – Geo Intelligence
•Cognigy: Building a leading NLU – conversational
AI, intent mapping etc.
Learning about the world from data
•To understand natural language we teach the machine
to learn from large text corpora
•Story of itera8ve improvements in our capacity to
machine learn about the world from data
•Increasing compute capacity enables new algorithms
and state-of-the-art improvements
Methodological Progress
•Before deep learning NLP
⚡Curse of dimensionality
•Word vector representa:ons
⚡ No syntac:c/seman:c context
•Sequence models
⚡ Long-range dependencies and complexity
Recent innovations
•Transformers
•New language modelling and training
techniques
•Fine-tuning/transfer learning
Classic Statistical NLP/NLU
• TF/IDF
• Bag of words, n-grams...
• Hidden Markov Models etc.
• Still relevant if you care about speed and compute cost!
Curse of dimensionality
From Steeve Huang, Word2Vec and FastText Word Embedding with Gensim (2018),
hCps://towardsdatascience.com/word-embedding-with-word2vec-and-fasCext-a209c1d3e12c
Vector representation breakthroughs
• Can we learn lower-dimensional word embeddings that capture some
meaning?
• Word2Vec
• Simple neural net Skip-Gram/CBOW model
• T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient EsKmaKon of Word
RepresentaKons in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013.
• GloVe
• Word-word co-occurrence matrix
• J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word
RepresentaKon,” 2014, pp. 1532–1543.
Word2Vec: Skip-gram
Word2Vec Tutorial - The Skip-Gram Model, McCormick 2016
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
From vocabulary size V to hidden layer size N
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013.
woman
queen
man
king
+woman
queen
-man
king
Sequence models
• RNNs, LSTM, GRU...
• State of art up un6l last year:
some variant of recurrent model + a@en6on
• Suffer from structure dilemma and vanishing gradient
problem
Attention
K. M. Hermann et al., “Teaching Machines to Read and
Comprehend,” arXiv:1506.03340 [cs], Jun. 2015.
Attention
D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine
Translation by Jointly Learning to Align and Translate,”
arXiv:1409.0473 [cs, stat], Sep. 2014.
What is attention?
• Vector interface to query – pay attention to -
salient features
• Answer à Question
• Translation à Source
• One position/token in a sentence
à All positions/tokens in a sentence
• Form of regularization to enable learning
• Produce salient semantic and syntactic features,
like attending to subject of a sentence
Transformers
• First sequence transduc.on
models with a4en.on only
• No RNNs or convolu.ons, just
posi.onal encodings
àlower computa.onal complexity
+ no sequen.al opera.ons
• A4en.on mechanisms specialise
on different task and capture
seman.c + syntac.c structure
A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017.
Transformers: co-reference resolu0on
A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017.
Should we be afraid of transformers?
•OpenAI lab decided not to release more powerful
GPT2 model because of AI safety concerns
•OpenAI started non-profit, now turned for-profit
•Push to make AI research private and proprietary for
AI safety and commercial gain
•Most public university AI labs bought by industry
What does this mean?
•Safety concern genuine – expect more efforts and
developments in this direc9on this as AI becomes
increasingly relevant tech for modern warfare
•Are we at bring of AI cold war?
•Is responsible and open coopera9on between
scien9sts and na9ons possible?
•AI is not dangerous, people are à more important
than ever to stand up for peace, open and free science
New kid on the block
BERT & Friends: recent innovations
• Language modelling and fine-tuning
• ULMFit (J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text ClassificaDon,”
arXiv:1801.06146 [cs, stat], Jan. 2018)
• ELMo (M. E. Peters et al., “Deep contextualized word representaDons,” arXiv:1802.05365 [cs], Feb.
2018)
• GPT (A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language
Understanding by GeneraDve Pre-Training,” p. 12)
• BERT (J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep BidirecDonal
Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct. 2018)
Fine-tuning
• ULMFit (J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text
Classification,” arXiv:1801.06146 [cs, stat], Jan. 2018)
Contextual embeddings
• ELMo (M. E. Peters et al., “Deep contextualized word representations,” arXiv:1802.05365 [cs], Feb.
2018)
Contextual embeddings
• ELMo (M. E. Peters et al., “Deep contextualized word representa;ons,” arXiv:1802.05365 [cs], Feb.
2018)
• Language model next token predic;on task
• Bi-Direc;onal LSTM looking at en;re sentence
• Extract contextual embedding from hidden states
Generative Pretraining
• Genera%ve Pre-Training (A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever,
“Improving Language Understanding by Genera%ve Pre-Training,” p. 12)
BERT
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding,” arXiv:1810.04805 [cs], Oct. 2018.
• Bi-Directional model achieved through masking
• Cloze task
• Sentence/Phrase switching
Fine-Tuning
• Take pre-trained model
• Adopt classification layer
to task
• Train for few epochs
Dynamic Coa+en-on Networks
80,383
Nov 01, 2016
A+en-onReader+
88.163
Dec 22, 2017
BERT
93,16
Oct 05, 2018
70
75
80
85
90
95
2016 2017 2018
Human Level Performance
Top F1-Score by year on the Stanford Question
Answering Dataset (SQuAD) 1.1.
SQuAD 1.1 contains 100,000+ question-answer
pairs on 500+ Wikipedia articles.
How to get started
• Author‘s implementation in TF https://github.com/google-
research/bert
• Pytorch implementation
https://github.com/huggingface/pytorch-pretrained-BERT
• Colab with free TPU
https://colab.research.google.com/github/tensorflow/tpu/blob/master/to
ols/colab/bert_finetuning_with_cloud_tpus.ipynb
• Flair https://github.com/zalandoresearch/flair
• Apache 2.0 licensed
Production
• Computa(onally cheap to train a huge model doesn‘t make it exactly
cheap in produc(on
• Mul(lingual BERT Base in 102 languages + Chinese
• Training your own full model is expensive, TPUs best op(on
• Accumula(ng gradients and other tricks to
hIps://medium.com/huggingface/training-larger-batches-prac(cal-
(ps-on-1-gpu-mul(-gpu-distributed-setups-ec88c3e51255
• BERT As Service hIps://github.com/hanxiao/bert-as-service
Future
•Improvements on BERT approach –
RNNs are not dead
•Better, more complex context/meaning
representations
•Increase in compute capacity, data and models
that best exploit available resources
Recap
•Deep Learning NLP: Learning
ever better language models
•ImageNet moment – transfer
learning breakthrough
•Latest innovations around
Transformers and BERT
C O N V E R S A T I O N A L A I
P L A T F O R M
D Ü S S E L D O R F | S A N F R A N C I S C O
W E ’ R E H I R I N G ! W W W . C O G N I G Y . C O M
S H O U L D W E B E A F R A I D
O F T R A N S F O R M E R S ?
D Ü S S E L D O R F , M A Y 2 0 1 9
D O M I N I K S E I S S E R | D . S E I S S E R @ C O G N I G Y . C O M
L E C T U R E
L I N K E D I N . C O M / I N / S E I S S E R
References
• J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representa@on,” 2014
• T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Es@ma@on of Word Representa@ons in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013.
• A. Vaswani et al., “APen@on Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017.
• J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classifica@on,” arXiv:1801.06146 [cs, stat], Jan. 2018.
• M. E. Peters et al., “Deep contextualized word representa@ons,” arXiv:1802.05365 [cs], Feb. 2018.
• A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Genera@ve Pre-Training,”.
• J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirec@onal Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct.
2018.
• Alammar, Jay. n.d. “The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning).” Accessed May 22, 2019. hPps://jalammar.github.io/illustrated-bert/.
• “BePer Language Models and Their Implica@ons.” 2019. OpenAI. February 14, 2019. hPps://openai.com/blog/bePer-language-models/.
• “NLP’s ImageNet Moment Has Arrived.” 2018. Sebas@an Ruder. July 12, 2018. hPp://ruder.io/nlp-imagenet/.
• “Transformer: A Novel Neural Network Architecture for Language Understanding.” n.d. Google AI Blog (blog). Accessed May 22, 2019.
hPp://ai.googleblog.com/2017/08/transformer-novel-neural-network.html.

More Related Content

Similar to Should we be afraid of Transformers?

Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
Andre Freitas
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Mark Tabladillo
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
singhg77
 
Towards Responsible NLP: Walking the walk
Towards Responsible NLP: Walking the walkTowards Responsible NLP: Walking the walk
Towards Responsible NLP: Walking the walk
MonaDiab7
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
Ashis Kumar Chanda
 
Sparse Composite Document Vector (Emnlp 2017)
Sparse Composite Document Vector (Emnlp 2017)Sparse Composite Document Vector (Emnlp 2017)
Sparse Composite Document Vector (Emnlp 2017)
Vivek Gupta
 
An Ecological Lens on the Stressors of Planning for OER Engagement
An Ecological Lens on the Stressors of Planning for OER Engagement An Ecological Lens on the Stressors of Planning for OER Engagement
An Ecological Lens on the Stressors of Planning for OER Engagement
Global OER Graduate Network
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
Sanjib Basak
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
Giorgia Lodi
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
Ashis Chanda
 
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
Mohamed Aboulela
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
VishnuRajuV
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_poster
hpcosta
 
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
Saurav Jha
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 

Similar to Should we be afraid of Transformers? (20)

Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
 
Towards Responsible NLP: Walking the walk
Towards Responsible NLP: Walking the walkTowards Responsible NLP: Walking the walk
Towards Responsible NLP: Walking the walk
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Sparse Composite Document Vector (Emnlp 2017)
Sparse Composite Document Vector (Emnlp 2017)Sparse Composite Document Vector (Emnlp 2017)
Sparse Composite Document Vector (Emnlp 2017)
 
An Ecological Lens on the Stressors of Planning for OER Engagement
An Ecological Lens on the Stressors of Planning for OER Engagement An Ecological Lens on the Stressors of Planning for OER Engagement
An Ecological Lens on the Stressors of Planning for OER Engagement
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
Digital Storytelling Enhnaces Students' Speaking Skills at Zewail University ...
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_poster
 
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 

Recently uploaded

Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 

Recently uploaded (20)

Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 

Should we be afraid of Transformers?

  • 1. SHOULD YOU BE AFRAID OF TRANSFORMERS? LECTURE DEEP LEARNING MEETUP, COLOGNE, 21.05.2019 DOMINIK SEISSER
  • 2. Intro • Over past year, string of deep learning innovation destroyed previous state-of-the art NLP benchmarks • We‘ll look how we got there, what future might look like and what you can do with it • Brief history of NLP deep learning with a small intermezzo on ethics
  • 3.
  • 4.
  • 5.
  • 6. aboutme •3 Gartner Cool Vendor AI/Big Data Startups •Gyana: Tech Lead/Head of AI, founded in Oxford, „Top 10“ AI Startup in London – Geo Intelligence •Cognigy: Building a leading NLU – conversational AI, intent mapping etc.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Learning about the world from data •To understand natural language we teach the machine to learn from large text corpora •Story of itera8ve improvements in our capacity to machine learn about the world from data •Increasing compute capacity enables new algorithms and state-of-the-art improvements
  • 14. Methodological Progress •Before deep learning NLP ⚡Curse of dimensionality •Word vector representa:ons ⚡ No syntac:c/seman:c context •Sequence models ⚡ Long-range dependencies and complexity
  • 15. Recent innovations •Transformers •New language modelling and training techniques •Fine-tuning/transfer learning
  • 16. Classic Statistical NLP/NLU • TF/IDF • Bag of words, n-grams... • Hidden Markov Models etc. • Still relevant if you care about speed and compute cost!
  • 17. Curse of dimensionality From Steeve Huang, Word2Vec and FastText Word Embedding with Gensim (2018), hCps://towardsdatascience.com/word-embedding-with-word2vec-and-fasCext-a209c1d3e12c
  • 18. Vector representation breakthroughs • Can we learn lower-dimensional word embeddings that capture some meaning? • Word2Vec • Simple neural net Skip-Gram/CBOW model • T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient EsKmaKon of Word RepresentaKons in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013. • GloVe • Word-word co-occurrence matrix • J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word RepresentaKon,” 2014, pp. 1532–1543.
  • 19. Word2Vec: Skip-gram Word2Vec Tutorial - The Skip-Gram Model, McCormick 2016 http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
  • 20. From vocabulary size V to hidden layer size N T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013.
  • 23.
  • 24.
  • 25. Sequence models • RNNs, LSTM, GRU... • State of art up un6l last year: some variant of recurrent model + a@en6on • Suffer from structure dilemma and vanishing gradient problem
  • 26. Attention K. M. Hermann et al., “Teaching Machines to Read and Comprehend,” arXiv:1506.03340 [cs], Jun. 2015.
  • 27. Attention D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv:1409.0473 [cs, stat], Sep. 2014.
  • 28. What is attention? • Vector interface to query – pay attention to - salient features • Answer à Question • Translation à Source • One position/token in a sentence à All positions/tokens in a sentence • Form of regularization to enable learning • Produce salient semantic and syntactic features, like attending to subject of a sentence
  • 29. Transformers • First sequence transduc.on models with a4en.on only • No RNNs or convolu.ons, just posi.onal encodings àlower computa.onal complexity + no sequen.al opera.ons • A4en.on mechanisms specialise on different task and capture seman.c + syntac.c structure A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017.
  • 30. Transformers: co-reference resolu0on A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017.
  • 31.
  • 32. Should we be afraid of transformers? •OpenAI lab decided not to release more powerful GPT2 model because of AI safety concerns •OpenAI started non-profit, now turned for-profit •Push to make AI research private and proprietary for AI safety and commercial gain •Most public university AI labs bought by industry
  • 33. What does this mean? •Safety concern genuine – expect more efforts and developments in this direc9on this as AI becomes increasingly relevant tech for modern warfare •Are we at bring of AI cold war? •Is responsible and open coopera9on between scien9sts and na9ons possible? •AI is not dangerous, people are à more important than ever to stand up for peace, open and free science
  • 34. New kid on the block
  • 35. BERT & Friends: recent innovations • Language modelling and fine-tuning • ULMFit (J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text ClassificaDon,” arXiv:1801.06146 [cs, stat], Jan. 2018) • ELMo (M. E. Peters et al., “Deep contextualized word representaDons,” arXiv:1802.05365 [cs], Feb. 2018) • GPT (A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by GeneraDve Pre-Training,” p. 12) • BERT (J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep BidirecDonal Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct. 2018)
  • 36. Fine-tuning • ULMFit (J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” arXiv:1801.06146 [cs, stat], Jan. 2018)
  • 37. Contextual embeddings • ELMo (M. E. Peters et al., “Deep contextualized word representations,” arXiv:1802.05365 [cs], Feb. 2018)
  • 38. Contextual embeddings • ELMo (M. E. Peters et al., “Deep contextualized word representa;ons,” arXiv:1802.05365 [cs], Feb. 2018) • Language model next token predic;on task • Bi-Direc;onal LSTM looking at en;re sentence • Extract contextual embedding from hidden states
  • 39. Generative Pretraining • Genera%ve Pre-Training (A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Genera%ve Pre-Training,” p. 12)
  • 40. BERT J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct. 2018. • Bi-Directional model achieved through masking • Cloze task • Sentence/Phrase switching
  • 41. Fine-Tuning • Take pre-trained model • Adopt classification layer to task • Train for few epochs
  • 42. Dynamic Coa+en-on Networks 80,383 Nov 01, 2016 A+en-onReader+ 88.163 Dec 22, 2017 BERT 93,16 Oct 05, 2018 70 75 80 85 90 95 2016 2017 2018 Human Level Performance Top F1-Score by year on the Stanford Question Answering Dataset (SQuAD) 1.1. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ Wikipedia articles.
  • 43. How to get started • Author‘s implementation in TF https://github.com/google- research/bert • Pytorch implementation https://github.com/huggingface/pytorch-pretrained-BERT • Colab with free TPU https://colab.research.google.com/github/tensorflow/tpu/blob/master/to ols/colab/bert_finetuning_with_cloud_tpus.ipynb • Flair https://github.com/zalandoresearch/flair • Apache 2.0 licensed
  • 44. Production • Computa(onally cheap to train a huge model doesn‘t make it exactly cheap in produc(on • Mul(lingual BERT Base in 102 languages + Chinese • Training your own full model is expensive, TPUs best op(on • Accumula(ng gradients and other tricks to hIps://medium.com/huggingface/training-larger-batches-prac(cal- (ps-on-1-gpu-mul(-gpu-distributed-setups-ec88c3e51255 • BERT As Service hIps://github.com/hanxiao/bert-as-service
  • 45. Future •Improvements on BERT approach – RNNs are not dead •Better, more complex context/meaning representations •Increase in compute capacity, data and models that best exploit available resources
  • 46. Recap •Deep Learning NLP: Learning ever better language models •ImageNet moment – transfer learning breakthrough •Latest innovations around Transformers and BERT
  • 47. C O N V E R S A T I O N A L A I P L A T F O R M D Ü S S E L D O R F | S A N F R A N C I S C O W E ’ R E H I R I N G ! W W W . C O G N I G Y . C O M
  • 48. S H O U L D W E B E A F R A I D O F T R A N S F O R M E R S ? D Ü S S E L D O R F , M A Y 2 0 1 9 D O M I N I K S E I S S E R | D . S E I S S E R @ C O G N I G Y . C O M L E C T U R E L I N K E D I N . C O M / I N / S E I S S E R
  • 49. References • J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representa@on,” 2014 • T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Es@ma@on of Word Representa@ons in Vector Space,” arXiv:1301.3781 [cs], Jan. 2013. • A. Vaswani et al., “APen@on Is All You Need,” arXiv:1706.03762 [cs], Jun. 2017. • J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classifica@on,” arXiv:1801.06146 [cs, stat], Jan. 2018. • M. E. Peters et al., “Deep contextualized word representa@ons,” arXiv:1802.05365 [cs], Feb. 2018. • A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Genera@ve Pre-Training,”. • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirec@onal Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct. 2018. • Alammar, Jay. n.d. “The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning).” Accessed May 22, 2019. hPps://jalammar.github.io/illustrated-bert/. • “BePer Language Models and Their Implica@ons.” 2019. OpenAI. February 14, 2019. hPps://openai.com/blog/bePer-language-models/. • “NLP’s ImageNet Moment Has Arrived.” 2018. Sebas@an Ruder. July 12, 2018. hPp://ruder.io/nlp-imagenet/. • “Transformer: A Novel Neural Network Architecture for Language Understanding.” n.d. Google AI Blog (blog). Accessed May 22, 2019. hPp://ai.googleblog.com/2017/08/transformer-novel-neural-network.html.