SlideShare a Scribd company logo
1 of 58
Download to read offline
TEXT REPRESENTATIONS
FOR DEEP LEARNING
ZACHARY BROWN
RVATECH SUMMIT - MARCH 14, 2019
OUTLINE
• Why Neural Methods for Natural Language Processing?
• Local vs. GlobalWord Representations
• Unsupervised Learning forWord Representations
• Including Sub-word Information
• Language Models for Contextual Representations
WHY NEURAL METHODS FOR NATURAL
LANGUAGE PROCESSING?
WHY NEURAL METHODS FOR NATURAL
LANGUAGE PROCESSING?
• Natural Language Processing (NLP) has moved largely to neural methods in recent years
WHAT IS "MODERN" NATURAL LANGUAGE
PROCESSING?
• Natural Language Processing (NLP) has moved largely to neural methods in recent years
• Traditional NLP builds on years of research into language
representation
• Theoretical foundations can lead to model rigidity
• Tasks often rely on manually generated and curated
dictionaries and thesauruses
• Built upon local word representations
WHAT IS "MODERN" NATURAL LANGUAGE
PROCESSING?
• Natural Language Processing (NLP) has moved largely to neural methods in recent years
• Few to no assumptions need to be made
• Model architectures purpose built for tasks
• Very active area of research, with most open-source
• Ability to learn global and contextualized word
representations
WHAT IS "MODERN" NATURAL LANGUAGE
PROCESSING?
• Natural Language Processing (NLP) has moved largely to neural methods in recent years
• Few to no assumptions need to be made
• Model architectures purpose built for tasks
• Very active area of research, with most open-source
• Ability to learn global and contextual word
representations
LOCALVS. GLOBAL WORD
REPRESENTATIONS
LOCAL WORD REPRESENTATIONS
• Traditional approaches to word representations treat each word as a unique entity
LOCAL WORD REPRESENTATIONS
• Traditional approaches to word representations treat each word as a unique entity
LOCAL WORD REPRESENTATIONS
• Traditional approaches to word representations treat each word as a unique entity
LOCAL WORD REPRESENTATIONS
• Traditional approaches to word representations treat each word as a unique entity
LOCALVS. GLOBAL WORD REPRESENTATIONS
• Modern approaches move to a fixed dimensional vector size, with dense vectors
LOCALVS. GLOBAL WORD REPRESENTATIONS
LOCALVS. GLOBAL WORD REPRESENTATIONS
• These dense vectors are global representations, and obey some notion of distance-based
similarity
LOCALVS. GLOBAL WORD REPRESENTATIONS
• Deep learning provides an avenue to compute these vectors in an unsupervised way
UNSUPERVISED LEARNING FOR WORD
REPRESENTATIONS
UNSUPERVISED LEARNING FOR WORD
REPRESENTATIONS
• We generate a seemingly endless
amount of text data each day
• ~ 460k Tweets every minute
• ~ 510k Facebook posts every minute
• We have accumulated vast amounts of
text data in online repositories
• Wikipedia has 5.8M (English) articles
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#36a98a9260ba
UNSUPERVISED LEARNING FOR WORD
REPRESENTATIONS
• We generate a seemingly endless
amount of text data each day
• ~ 460k Tweets every minute
• ~ 510k Facebook posts every minute
• We have accumulated vast amounts of
text data in online repositories
• Wikipedia has 5.8M (English) articles
1.6M words!
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
UNSUPERVISED LEARNING FOR WORD
REPRESENTATIONS
• We generate a seemingly endless
amount of text data each day
• ~ 460k Tweets every minute
• ~ 510k Facebook posts every minute
• We have accumulated vast amounts of
text data in online repositories
• Wikipedia has 5.8M (English) articles
1.6M words!
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
CONTEXTUAL INFORMATION IN FREE TEXT
https://arxiv.org/pdf/1301.3781.pdf
CONTEXTUAL INFORMATION IN FREE TEXT
CONTEXTUAL INFORMATION IN FREE TEXT
center word
context words
CONTEXTUAL INFORMATION IN FREE TEXT
Can we predict the probability of a context word for a given center word?
CONTEXTUAL INFORMATION IN FREE TEXT
Can we predict the probability of a context word for a given center word?
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
...
CONTEXTUAL INFORMATION IN FREE TEXT
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
...
COMPUTING EFFICIENT WORD REPRESENTATIONS
• Our goal is to learn all of the elements in the dense matrix on the right
COMPUTING EFFICIENT WORD REPRESENTATIONS
• To do this, we treat this matrix as a matrix of model weights, which will be optimized
COMPUTING EFFICIENT WORD REPRESENTATIONS
• To optimize, we start with a single example from our training set:
(fox, quick)
COMPUTING EFFICIENT WORD REPRESENTATIONS
• Grab the current vector of weights for those words:
(fox, quick)
COMPUTING EFFICIENT WORD REPRESENTATIONS
• Use these vectors to calculate the probability of the context word, given the center word
(fox, quick)
COMPUTING EFFICIENT WORD REPRESENTATIONS
• Then compare the probability to the true value (0 or 1), and update the weights accordingly
(fox, quick)
https://projector.tensorflow.org/
WORD2VEC EMBEDDINGVISUALIZATION
https://projector.tensorflow.org/
INCLUDING SUB-WORD INFORMATION
INCLUDING SUB-WORD INFORMATION
• Word2vec really opened the doors for computing global representations of
words, which can then be used for a variety of different tasks, but…
… is it possible to make it better?
INCLUDING SUB-WORD INFORMATION
• Word2vec really opened the doors for computing global representations of
words, which can then be used for a variety of different tasks, but…
• YES!What if we included sub-word information?
… is it possible to make it better?
https://fasttext.cc/
INCLUDING SUB-WORD INFORMATION
• Word2vec seeks to find a unique vector for each individual word
quick
INCLUDING SUB-WORD INFORMATION
• FastText seeks to find a vector for sequences of characters within each word
<quick>
INCLUDING SUB-WORD INFORMATION
• FastText seeks to find a vector for sequences of characters within each word
<quick>
<qu
qui
uic
uik
ck>
INCLUDING SUB-WORD INFORMATION
• Each word is then the sum of each of the sub-word vectors
<quick>
<qu
qui
uic
uik
ck>
+
+
+
+
+
= quick
INCLUDING SUB-WORD INFORMATION
(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
...
• From there, the FastText utilizes the same skip-gram model tor training as
word2vec (with low-level optimizations)
INCLUDING SUB-WORD INFORMATION
• Benefits of FastText
• Learn useful representations of prefixes and suffixes
• Learn useful word roots
• Out of vocabulary inference!!
• Drawbacks of FastText
• Very large number of model parameters
• Known to be difficult to tune
LANGUAGE MODELS FOR CONTEXTUAL
REPRESENTATIONS
THE NEED FOR CONTEXTUAL REPRESENTATIONS
• Skip-gram models are fantastic for computing a single, fixed representation for
a given word or token, but…
… words can have multiple meanings depending on the context…
THE NEED FOR CONTEXTUAL REPRESENTATIONS
• Skip-gram models are fantastic for computing a single, fixed representation for
a given word or token, but…
… words can have multiple meanings depending on the context…
context matters.
LANGUAGE MODELS
• Language models aim to predict the next word in a sequence of words
CONTEXTUAL REPRESENTATIONS
• Language models aim to predict the next vector in a sequence of words
CONTEXTUAL REPRESENTATIONS
• Language models aim to predict the next word in a sequence or words
• Neural language models can also be stacked, creating multiple representations
DEEP CONTEXTUAL REPRESENTATIONS
• Neural language models can also be stacked, creating multiple representations
DEEP CONTEXTUAL REPRESENTATIONS
fox = + + + …
• Neural language models can also be stacked, creating multiple representations
DEEP CONTEXTUAL REPRESENTATIONS
https://allennlp.org/elmo, https://arxiv.org/abs/1810.04805
fox = + + + …
THANKYOU FORYOUR ATTENTION!
QUESTIONS?

More Related Content

What's hot

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Is acquiring knowledge of verb subcategorization in English easier? A partial...
Is acquiring knowledge of verb subcategorization in English easier? A partial...Is acquiring knowledge of verb subcategorization in English easier? A partial...
Is acquiring knowledge of verb subcategorization in English easier? A partial...Yu Tamura
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 

What's hot (20)

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Language models
Language modelsLanguage models
Language models
 
1910 HCLT
1910 HCLT1910 HCLT
1910 HCLT
 
Is acquiring knowledge of verb subcategorization in English easier? A partial...
Is acquiring knowledge of verb subcategorization in English easier? A partial...Is acquiring knowledge of verb subcategorization in English easier? A partial...
Is acquiring knowledge of verb subcategorization in English easier? A partial...
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 

Similar to Text Representations for Deep learning

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Assamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationAssamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationKalyanee Baruah
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPRrubaa Panchendrarajan
 
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹台灣資料科學年會
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
What might a spoken corpus tell us about language
What might a spoken corpus tell us about languageWhat might a spoken corpus tell us about language
What might a spoken corpus tell us about languageUCLDH
 
Introduction to nlp
Introduction to nlpIntroduction to nlp
Introduction to nlpAmaan Shaikh
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyondlinshanleearchive
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722LinkedIn
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722LinkedIn
 
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017Lviv Startup Club
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 

Similar to Text Representations for Deep learning (20)

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Assamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationAssamese to English Statistical Machine Translation
Assamese to English Statistical Machine Translation
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
What might a spoken corpus tell us about language
What might a spoken corpus tell us about languageWhat might a spoken corpus tell us about language
What might a spoken corpus tell us about language
 
Introduction to nlp
Introduction to nlpIntroduction to nlp
Introduction to nlp
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyond
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017
Dmitry Spodarets - “Environment for data scientist” AI&BigDataDay 2017
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
Machine translator Introduction
Machine translator IntroductionMachine translator Introduction
Machine translator Introduction
 

More from Zachary S. Brown

Working in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsWorking in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsZachary S. Brown
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Building and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesBuilding and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesZachary S. Brown
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLPZachary S. Brown
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 

More from Zachary S. Brown (6)

Working in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsWorking in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language Models
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Building and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model ServicesBuilding and Deploying Scalable NLP Model Services
Building and Deploying Scalable NLP Model Services
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLP
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Text Representations for Deep learning

  • 1. TEXT REPRESENTATIONS FOR DEEP LEARNING ZACHARY BROWN RVATECH SUMMIT - MARCH 14, 2019
  • 2. OUTLINE • Why Neural Methods for Natural Language Processing? • Local vs. GlobalWord Representations • Unsupervised Learning forWord Representations • Including Sub-word Information • Language Models for Contextual Representations
  • 3. WHY NEURAL METHODS FOR NATURAL LANGUAGE PROCESSING?
  • 4. WHY NEURAL METHODS FOR NATURAL LANGUAGE PROCESSING? • Natural Language Processing (NLP) has moved largely to neural methods in recent years
  • 5. WHAT IS "MODERN" NATURAL LANGUAGE PROCESSING? • Natural Language Processing (NLP) has moved largely to neural methods in recent years • Traditional NLP builds on years of research into language representation • Theoretical foundations can lead to model rigidity • Tasks often rely on manually generated and curated dictionaries and thesauruses • Built upon local word representations
  • 6. WHAT IS "MODERN" NATURAL LANGUAGE PROCESSING? • Natural Language Processing (NLP) has moved largely to neural methods in recent years • Few to no assumptions need to be made • Model architectures purpose built for tasks • Very active area of research, with most open-source • Ability to learn global and contextualized word representations
  • 7. WHAT IS "MODERN" NATURAL LANGUAGE PROCESSING? • Natural Language Processing (NLP) has moved largely to neural methods in recent years • Few to no assumptions need to be made • Model architectures purpose built for tasks • Very active area of research, with most open-source • Ability to learn global and contextual word representations
  • 9. LOCAL WORD REPRESENTATIONS • Traditional approaches to word representations treat each word as a unique entity
  • 10. LOCAL WORD REPRESENTATIONS • Traditional approaches to word representations treat each word as a unique entity
  • 11. LOCAL WORD REPRESENTATIONS • Traditional approaches to word representations treat each word as a unique entity
  • 12. LOCAL WORD REPRESENTATIONS • Traditional approaches to word representations treat each word as a unique entity
  • 13. LOCALVS. GLOBAL WORD REPRESENTATIONS • Modern approaches move to a fixed dimensional vector size, with dense vectors
  • 14. LOCALVS. GLOBAL WORD REPRESENTATIONS
  • 15. LOCALVS. GLOBAL WORD REPRESENTATIONS • These dense vectors are global representations, and obey some notion of distance-based similarity
  • 16. LOCALVS. GLOBAL WORD REPRESENTATIONS • Deep learning provides an avenue to compute these vectors in an unsupervised way
  • 17. UNSUPERVISED LEARNING FOR WORD REPRESENTATIONS
  • 18. UNSUPERVISED LEARNING FOR WORD REPRESENTATIONS • We generate a seemingly endless amount of text data each day • ~ 460k Tweets every minute • ~ 510k Facebook posts every minute • We have accumulated vast amounts of text data in online repositories • Wikipedia has 5.8M (English) articles https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#36a98a9260ba
  • 19. UNSUPERVISED LEARNING FOR WORD REPRESENTATIONS • We generate a seemingly endless amount of text data each day • ~ 460k Tweets every minute • ~ 510k Facebook posts every minute • We have accumulated vast amounts of text data in online repositories • Wikipedia has 5.8M (English) articles 1.6M words! https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
  • 20. UNSUPERVISED LEARNING FOR WORD REPRESENTATIONS • We generate a seemingly endless amount of text data each day • ~ 460k Tweets every minute • ~ 510k Facebook posts every minute • We have accumulated vast amounts of text data in online repositories • Wikipedia has 5.8M (English) articles 1.6M words! https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
  • 21. CONTEXTUAL INFORMATION IN FREE TEXT https://arxiv.org/pdf/1301.3781.pdf
  • 23. CONTEXTUAL INFORMATION IN FREE TEXT center word context words
  • 24. CONTEXTUAL INFORMATION IN FREE TEXT Can we predict the probability of a context word for a given center word?
  • 25. CONTEXTUAL INFORMATION IN FREE TEXT Can we predict the probability of a context word for a given center word?
  • 26. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown)
  • 27. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox)
  • 28. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox) (brown, the) (brown, quick) (brown, fox) (brown, jumps)
  • 29. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox) (brown, the) (brown, quick) (brown, fox) (brown, jumps) (fox, quick) (fox, brown) (fox, jumps) (fox, over)
  • 30. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox) (brown, the) (brown, quick) (brown, fox) (brown, jumps) (fox, quick) (fox, brown) (fox, jumps) (fox, over) ...
  • 31. CONTEXTUAL INFORMATION IN FREE TEXT (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox) (brown, the) (brown, quick) (brown, fox) (brown, jumps) (fox, quick) (fox, brown) (fox, jumps) (fox, over) ...
  • 32. COMPUTING EFFICIENT WORD REPRESENTATIONS • Our goal is to learn all of the elements in the dense matrix on the right
  • 33. COMPUTING EFFICIENT WORD REPRESENTATIONS • To do this, we treat this matrix as a matrix of model weights, which will be optimized
  • 34. COMPUTING EFFICIENT WORD REPRESENTATIONS • To optimize, we start with a single example from our training set: (fox, quick)
  • 35. COMPUTING EFFICIENT WORD REPRESENTATIONS • Grab the current vector of weights for those words: (fox, quick)
  • 36. COMPUTING EFFICIENT WORD REPRESENTATIONS • Use these vectors to calculate the probability of the context word, given the center word (fox, quick)
  • 37. COMPUTING EFFICIENT WORD REPRESENTATIONS • Then compare the probability to the true value (0 or 1), and update the weights accordingly (fox, quick)
  • 41. INCLUDING SUB-WORD INFORMATION • Word2vec really opened the doors for computing global representations of words, which can then be used for a variety of different tasks, but… … is it possible to make it better?
  • 42. INCLUDING SUB-WORD INFORMATION • Word2vec really opened the doors for computing global representations of words, which can then be used for a variety of different tasks, but… • YES!What if we included sub-word information? … is it possible to make it better? https://fasttext.cc/
  • 43. INCLUDING SUB-WORD INFORMATION • Word2vec seeks to find a unique vector for each individual word quick
  • 44. INCLUDING SUB-WORD INFORMATION • FastText seeks to find a vector for sequences of characters within each word <quick>
  • 45. INCLUDING SUB-WORD INFORMATION • FastText seeks to find a vector for sequences of characters within each word <quick> <qu qui uic uik ck>
  • 46. INCLUDING SUB-WORD INFORMATION • Each word is then the sum of each of the sub-word vectors <quick> <qu qui uic uik ck> + + + + + = quick
  • 47. INCLUDING SUB-WORD INFORMATION (the, quick) (the, brown) (quick, the) (quick, brown) (quick, fox) (brown, the) (brown, quick) (brown, fox) (brown, jumps) (fox, quick) (fox, brown) (fox, jumps) (fox, over) ... • From there, the FastText utilizes the same skip-gram model tor training as word2vec (with low-level optimizations)
  • 48. INCLUDING SUB-WORD INFORMATION • Benefits of FastText • Learn useful representations of prefixes and suffixes • Learn useful word roots • Out of vocabulary inference!! • Drawbacks of FastText • Very large number of model parameters • Known to be difficult to tune
  • 49. LANGUAGE MODELS FOR CONTEXTUAL REPRESENTATIONS
  • 50. THE NEED FOR CONTEXTUAL REPRESENTATIONS • Skip-gram models are fantastic for computing a single, fixed representation for a given word or token, but… … words can have multiple meanings depending on the context…
  • 51. THE NEED FOR CONTEXTUAL REPRESENTATIONS • Skip-gram models are fantastic for computing a single, fixed representation for a given word or token, but… … words can have multiple meanings depending on the context… context matters.
  • 52. LANGUAGE MODELS • Language models aim to predict the next word in a sequence of words
  • 53. CONTEXTUAL REPRESENTATIONS • Language models aim to predict the next vector in a sequence of words
  • 54. CONTEXTUAL REPRESENTATIONS • Language models aim to predict the next word in a sequence or words
  • 55. • Neural language models can also be stacked, creating multiple representations DEEP CONTEXTUAL REPRESENTATIONS
  • 56. • Neural language models can also be stacked, creating multiple representations DEEP CONTEXTUAL REPRESENTATIONS fox = + + + …
  • 57. • Neural language models can also be stacked, creating multiple representations DEEP CONTEXTUAL REPRESENTATIONS https://allennlp.org/elmo, https://arxiv.org/abs/1810.04805 fox = + + + …