SlideShare a Scribd company logo
1 of 20
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A Pragmatic Introduction to
Natural Language Processing models
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem statement
• Since the dawn of AI, NLP has been a major research field
• Text classification, machine translation, text generation, chat bots, voice assistants, etc.
• NLP apps require a language model in order to predict the next word
• Given a sequence of words (w1, … , wn), predict wn+1 that has the highest probability
• Vocabulary size can be hundreds of thousands of words
… in millions of documents
• Can we build a compact mathematical representation of language,
that would help us with a variety of downstream NLP tasks?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
« You shall know a word by the company it keeps », Firth (1957)
• Initial idea: build word vectors from co-occurrence counts
• Also called embeddings
• High dimensional: at least 50, up to 300
• Much more compact than an n*n matrix, which would be extremely sparse
• Words with similar meanings should have similar vectors
• “car” ≈ “automobile” ≈ “sedan”
• The distance between related vectors should be similar
• distance (“Paris”, ”France”) ≈ distance(“Berlin”, ”Germany”)
• distance(“hot”, ”hotter”) ≈ distance(“cold”, ”colder”)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-level view
1. Start from a large text corpus (100s of millions of words, even billions)
2. Preprocess the corpus
• Tokenize: « hello, world! » → « <BOS>hello<SP>world<SP>!<EOS>»
• Multi-word entities: « Rio de Janeiro » → « rio_de_janeiro »
3. Build the vocabulary
4. Learn vector representations for all words
… or simply use (or fine-tune) pre-trained vectors (more on this later)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Word2Vec (2013)
https://arxiv.org/abs/1301.3781
https://code.google.com/archive/p/word2vec/
• Continuous bag of words (CBOW):
• the model predicts the current word from surrounding context words
• Word order doesn’t matter (hence the ‘bag of words’)
• Skipgram
• the model uses the current word to predict the surrounding window of
context words
• This may work better when little data is available
• CBOW trains faster, skipgram is more accurate
• C code, based on shallow neural network
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Vectors aka GloVe (2014)
https://nlp.stanford.edu/projects/glove/
https://github.com/stanfordnlp/GloVe
https://www.quora.com/How-is-GloVe-different-from-word2vec
• Performance generally similar to Word2Vec
• Several pre-trained embedding collections:
up to 840 billion tokens, 2.2 million vocabulary, 300 dimensions
• C code, based on matrix factorization
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FastText (2016)
https://arxiv.org/abs/1607.04606 + https://arxiv.org/abs/1802.06893
https://fasttext.cc/
https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText
• Extension of Word2Vec: each word is a set of subwords, aka n-grams
• « Computer », n=5 : <START>Comp , compu, omput, mpute, puter, uter<END>
• A word vector is the average of its subword vectors
• Good for unknown/mispelled words, as they share subwords with known words
• « Computerized » and « Cmputer » should be close to « Computer »
• Three modes
• Unsupervised learning: compute embeddings (294 languages available)
• Supervised learning: use / fine-tune embeddings for multi-label, multi-class classification
• Also language detection for 170 languages
• Multithreaded C++ code, with Python API
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://dl.acm.org/citation.cfm?id=3146354
https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and-
word-vectors-using-amazon-sagemaker-blazingtext/
• Amazon-invented algorithm,
available in Amazon SageMaker
• Extends FastText with GPU capabilities
• Unsupervised learning: word vectors
• 20x faster
• CBOW and skip-gram with subword support
• Batch skip-gram for distributed training
• Supervised learning: text classification
• 100x faster
• Models are compatible with FastText
BlazingText (2017)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Limitations of Word2Vec (and family)
• Some words have different meanings (aka polysemy)
• « Kevin, stop throwing rocks! » vs. « Machine Learning rocks »
• Word2Vec encodes the different meanings of a word into the same vector
• Bidirectional context is not taken into account
• Previous words (left-to-right) and next words (right-to-left)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Embeddings from Language Models aka ELMo (02/2018)
https://arxiv.org/abs/1802.05365
https://allennlp.org/elmo
• ELMo generates a context-aware vector
for each word
• Character-level CNN + two unidirectional LSTMs
• Bidirectional context, no cheating possible
(can’t peek at the next word)
• “Deep” embeddings, reflecting output from all layers
• No vocabulary, no vector file:
you need to use the model itself
• Reference implementation
with TensorFlow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bidirectional EncoderRepresentationsfrom Transformers aka BERT (10/2018)
https://arxiv.org/abs/1810.04805
https://github.com/google-research/bert
https://www.quora.com/What-are-the-main-differences-between-the-word-embeddings-of-ELMo-BERT-Word2vec-and-GloVe
• BERT improves on ELMo
• Replace LSTM with Transformers,
which deal better with long-term dependencies
• True bidirectional architecture: left-to-right and right-to-left
contexts are learned by the same network
• 15% of words are randomly masked during training
to prevent cheating
• Pre-trained models: BERT Base and BERT Large
• Masked word prediction
• Next sentence prediction
• Reference implementation with TensorFlow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Limitations of BERT
• BERT cannot handle more than 512 input tokens
• BERT masks words during training, but not during fine-tuning
(aka training/fine-tuning discrepancy)
• BERT isn’t trained to predict the next word, so it’s not great at text generation
• BERT doesn’t learn dependencies for masked words
• Train « I am going to <MASK> my <MASK> » on « walk » / « dog », « eat » / « breakfast », and « debug » / « code ».
• BERT could legitimately predict « I am going to eat my code » or « I am going to debug my dog » :-/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
XLNet (06/2019)
https://arxiv.org/abs/1906.08237
https://github.com/zihangdai/xlnet
• XLNet beats BERT at 20 tasks
• XLNet uses bidirectional context,
but words are randomly permuted
• No cheating possible
• No masking required
• XLNet Base and XLNet Large
• Reference implementation with TensorFlow
07/2019: ERNIE 2.0 (Baidu)
beats BERT & XLNet
https://github.com/PaddlePaddle/ERNIE/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summing things up
• Word2Vec and friends
• Try pre-trained embeddings first
• Check that the training corpus is similar to your own data
• Same language, similar vocabulary
• Remember that subword models will help with unknown / mispelled words
• If you have exotic requirements AND lots of data, training is not expensive
• ElMo, BERT, XLNet
• Training is very expensive: several days using several GPUs
• Fine-tuning is cheap: just a few GPU hours for SOTA results
• Fine-tuning scripts and pre-trained models are available: start there!
• The ”best model” is the one that works best on your business problem
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo: finding similarities and analogies
with Gluon NLP and pre-trained GloVe embeddings
https://gitlab.com/juliensimon/dlnotebooks/gluonnlp/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo: embeddings with ELMo on TensorFlow
https://gitlab.com/juliensimon/dlnotebooks/blob/master/nlp/ELMO%20TensorFlow.ipynb
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started on AWS
https://ml.aws
https://aws.amazon.com/marketplace/solutions/machine-learning/natural-
language-processing → off the shelf algos and models that might just save you from
this madness ;)
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon

More Related Content

What's hot

What's hot (20)

AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMakerWorkshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithms
 
Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
 
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge
 
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Workshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon RekognitionWorkshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
 
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
 
AI & ML on AWS: State of the Union
AI & ML on AWS: State of the UnionAI & ML on AWS: State of the Union
AI & ML on AWS: State of the Union
 

Similar to A pragmatic introduction to natural language processing models (October 2019)

NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
Sanghamitra Deb
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Lucidworks
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
Iván Montes
 

Similar to A pragmatic introduction to natural language processing models (October 2019) (20)

NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
GluonCV
GluonCVGluonCV
GluonCV
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCV
 
Envisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesEnvisioning the Future of Language Workbenches
Envisioning the Future of Language Workbenches
 
Dean4j@Njug5
Dean4j@Njug5Dean4j@Njug5
Dean4j@Njug5
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
Java And Community Support
Java And Community SupportJava And Community Support
Java And Community Support
 
Building Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS WebinarBuilding Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS Webinar
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyMCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
 
Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017
 
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS Sagemaker
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
How Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS WebinarHow Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS Webinar
 

More from Julien SIMON

More from Julien SIMON (16)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

A pragmatic introduction to natural language processing models (October 2019)

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. A Pragmatic Introduction to Natural Language Processing models Julien Simon Global Evangelist, AI & Machine Learning @julsimon https://medium.com/@julsimon
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Since the dawn of AI, NLP has been a major research field • Text classification, machine translation, text generation, chat bots, voice assistants, etc. • NLP apps require a language model in order to predict the next word • Given a sequence of words (w1, … , wn), predict wn+1 that has the highest probability • Vocabulary size can be hundreds of thousands of words … in millions of documents • Can we build a compact mathematical representation of language, that would help us with a variety of downstream NLP tasks?
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. « You shall know a word by the company it keeps », Firth (1957) • Initial idea: build word vectors from co-occurrence counts • Also called embeddings • High dimensional: at least 50, up to 300 • Much more compact than an n*n matrix, which would be extremely sparse • Words with similar meanings should have similar vectors • “car” ≈ “automobile” ≈ “sedan” • The distance between related vectors should be similar • distance (“Paris”, ”France”) ≈ distance(“Berlin”, ”Germany”) • distance(“hot”, ”hotter”) ≈ distance(“cold”, ”colder”)
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-level view 1. Start from a large text corpus (100s of millions of words, even billions) 2. Preprocess the corpus • Tokenize: « hello, world! » → « <BOS>hello<SP>world<SP>!<EOS>» • Multi-word entities: « Rio de Janeiro » → « rio_de_janeiro » 3. Build the vocabulary 4. Learn vector representations for all words … or simply use (or fine-tune) pre-trained vectors (more on this later)
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Word2Vec (2013) https://arxiv.org/abs/1301.3781 https://code.google.com/archive/p/word2vec/ • Continuous bag of words (CBOW): • the model predicts the current word from surrounding context words • Word order doesn’t matter (hence the ‘bag of words’) • Skipgram • the model uses the current word to predict the surrounding window of context words • This may work better when little data is available • CBOW trains faster, skipgram is more accurate • C code, based on shallow neural network
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Vectors aka GloVe (2014) https://nlp.stanford.edu/projects/glove/ https://github.com/stanfordnlp/GloVe https://www.quora.com/How-is-GloVe-different-from-word2vec • Performance generally similar to Word2Vec • Several pre-trained embedding collections: up to 840 billion tokens, 2.2 million vocabulary, 300 dimensions • C code, based on matrix factorization
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. FastText (2016) https://arxiv.org/abs/1607.04606 + https://arxiv.org/abs/1802.06893 https://fasttext.cc/ https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText • Extension of Word2Vec: each word is a set of subwords, aka n-grams • « Computer », n=5 : <START>Comp , compu, omput, mpute, puter, uter<END> • A word vector is the average of its subword vectors • Good for unknown/mispelled words, as they share subwords with known words • « Computerized » and « Cmputer » should be close to « Computer » • Three modes • Unsupervised learning: compute embeddings (294 languages available) • Supervised learning: use / fine-tune embeddings for multi-label, multi-class classification • Also language detection for 170 languages • Multithreaded C++ code, with Python API
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://dl.acm.org/citation.cfm?id=3146354 https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and- word-vectors-using-amazon-sagemaker-blazingtext/ • Amazon-invented algorithm, available in Amazon SageMaker • Extends FastText with GPU capabilities • Unsupervised learning: word vectors • 20x faster • CBOW and skip-gram with subword support • Batch skip-gram for distributed training • Supervised learning: text classification • 100x faster • Models are compatible with FastText BlazingText (2017)
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Limitations of Word2Vec (and family) • Some words have different meanings (aka polysemy) • « Kevin, stop throwing rocks! » vs. « Machine Learning rocks » • Word2Vec encodes the different meanings of a word into the same vector • Bidirectional context is not taken into account • Previous words (left-to-right) and next words (right-to-left)
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Embeddings from Language Models aka ELMo (02/2018) https://arxiv.org/abs/1802.05365 https://allennlp.org/elmo • ELMo generates a context-aware vector for each word • Character-level CNN + two unidirectional LSTMs • Bidirectional context, no cheating possible (can’t peek at the next word) • “Deep” embeddings, reflecting output from all layers • No vocabulary, no vector file: you need to use the model itself • Reference implementation with TensorFlow
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bidirectional EncoderRepresentationsfrom Transformers aka BERT (10/2018) https://arxiv.org/abs/1810.04805 https://github.com/google-research/bert https://www.quora.com/What-are-the-main-differences-between-the-word-embeddings-of-ELMo-BERT-Word2vec-and-GloVe • BERT improves on ELMo • Replace LSTM with Transformers, which deal better with long-term dependencies • True bidirectional architecture: left-to-right and right-to-left contexts are learned by the same network • 15% of words are randomly masked during training to prevent cheating • Pre-trained models: BERT Base and BERT Large • Masked word prediction • Next sentence prediction • Reference implementation with TensorFlow
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Limitations of BERT • BERT cannot handle more than 512 input tokens • BERT masks words during training, but not during fine-tuning (aka training/fine-tuning discrepancy) • BERT isn’t trained to predict the next word, so it’s not great at text generation • BERT doesn’t learn dependencies for masked words • Train « I am going to <MASK> my <MASK> » on « walk » / « dog », « eat » / « breakfast », and « debug » / « code ». • BERT could legitimately predict « I am going to eat my code » or « I am going to debug my dog » :-/
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. XLNet (06/2019) https://arxiv.org/abs/1906.08237 https://github.com/zihangdai/xlnet • XLNet beats BERT at 20 tasks • XLNet uses bidirectional context, but words are randomly permuted • No cheating possible • No masking required • XLNet Base and XLNet Large • Reference implementation with TensorFlow 07/2019: ERNIE 2.0 (Baidu) beats BERT & XLNet https://github.com/PaddlePaddle/ERNIE/
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summing things up • Word2Vec and friends • Try pre-trained embeddings first • Check that the training corpus is similar to your own data • Same language, similar vocabulary • Remember that subword models will help with unknown / mispelled words • If you have exotic requirements AND lots of data, training is not expensive • ElMo, BERT, XLNet • Training is very expensive: several days using several GPUs • Fine-tuning is cheap: just a few GPU hours for SOTA results • Fine-tuning scripts and pre-trained models are available: start there! • The ”best model” is the one that works best on your business problem
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo: finding similarities and analogies with Gluon NLP and pre-trained GloVe embeddings https://gitlab.com/juliensimon/dlnotebooks/gluonnlp/
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo: embeddings with ELMo on TensorFlow https://gitlab.com/juliensimon/dlnotebooks/blob/master/nlp/ELMO%20TensorFlow.ipynb
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started on AWS https://ml.aws https://aws.amazon.com/marketplace/solutions/machine-learning/natural- language-processing → off the shelf algos and models that might just save you from this madness ;) https://aws.amazon.com/sagemaker https://github.com/awslabs/amazon-sagemaker-examples
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Evangelist, AI & Machine Learning @julsimon https://medium.com/@julsimon