SlideShare a Scribd company logo
1 of 24
|
Presented By
Date
July 6, 2017
Sujit Pal, Elsevier Labs
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe
for text classification and similarity
| 2
INSPIRATION
| 3
ACKNOWLEDGEMENTS
| 4
AGENDA
• NLP Pipelines before Deep Learning
• Deconstructing the “Encode, Embed, Attend, Predict” pipeline.
• Example #1: Document Classification
• Example #2: Document Similarity
• Example #3: Sentence Similarity
| 5
NLP PIPELINES BEFORE DEEP LEARNING
• Document Collection centric
• Based on Information Retrieval
• Document collection to matrix
• Densify using feature reduction
• Feed into SVM for classification, etc.
| 6
NLP PIPELINES BEFORE DEEP LEARNING
• Idea borrowed from Machine Learning (ML)
• Represent categorical variables (words) as 1-hot vectors
• Represent sentences as matrix of 1-hot word vectors
• No distributional semantics at word level.
| 7
WORD EMBEDDINGS
• Word2Vec – predict word from
context (CBOW) or context from word
(skip-gram) shown here.
• Other embeddings – GloVe, FastText.
• Pretrained models available
• Encode word “meanings”.
| 8
STEP #1: EMBED
• Converts from word ID to word vector
• Change: replace 1-hot vectors with 3rd party embeddings.
• Embeddings encode distributional semantics
• Sentence represented as sequence of dense word vectors
| 9
STEP #2: ENCODE
• Converts sequence of vectors (word vectors) to a matrix (sentence matrix).
• Bag of words – concatenate word vectors together.
• Each row of sentence matrix encodes the meaning of each word in the context of
the sentence.
• Generally use LSTM (Long Short Term Memory) or GRU (Gated Recurrent Unit)
• Bidirectional processes words left to right and right to left and concatenates.
| 10
STEP #3: ATTEND
• Reduces matrix (sentence matrix) to a vector (sentence vector)
• Non-attention mechanism –Sum or Average/Max Pooling
• Attention tells what to keep during reduction to minimize information loss.
• Different kinds – matrix, matrix + context (learned), matrix + vector (provided),
matrix + matrix.
| 11
ATTENTION: MATRIX
• Proposed by Raffel, et al
• Intuition: select most important
element from each timestep
• Learnable weights W and b
depending on target.
• Code on Github
| 12
ATTENTION: MATRIX + VECTOR (LEARNED)
• Proposed by Lin, et al
• Intuition: select most important
element from each timestep and
weight with another learned vector u.
• Code on Github
| 13
ATTENTION: MATRIX + VECTOR (PROVIDED)
• Proposed by Cho, et al
• Intuition: select most important
element from each timestep and
weight it with a learned multiple of
a provided context vector
• Code on Github
| 14
ATTENTION: MATRIX + MATRIX
• Proposed by Parikh, et al
• Intuition: build alignment (similarity) matrix
by multiplying learned vectors from each
matrix, compute context vectors from the
alignment matrix, and mix with original
signal.
• Code on Github
| 15
STEP #4: PREDICT
• Convert reduced vector to a label.
• Generally uses shallow fully connected networks such as the one shown.
• Can also be modified to have a regression head (return the probabilities
from the softmax activation.
| 16
DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #1
• 20 newsgroups dataset
• 40k training records
• 10k test records
• 20 classes
• Embed, Predict
• Bag of Words idea
• Sentence = bag of words
• Document = bag of sentences
• Code on Github
| 17
DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #2
• Embed, Encode, Predict
• Hierarchical Encoding
• Sentence Encoder: converts
sequence of word vectors to
sentence vector.
• Document Encoder: converts
sequence of sentence vectors
to document vector.
• Sentence encoder Network
embedded inside Document
network.
• Code on Github
| 18
DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #3 (a, b, c)
• Embed, Encode, Attend,
Predict
• Encode step returns matrix,
vector for each time step.
• Attend reduces matrix to
vector.
• 3 types of attention (all
except Matrix Matrix) applied
to different versions of model.
• Code on Github – (a), (b), (c)
| 19
DOCUMENT CLASSIFICATION EXAMPLE – RESULTS
| 20
DOCUMENT SIMILARITY EXAMPLE
• Data derived from 20 newsgroups
• Hierarchical Model (Word to Sentence and
sentence to document)
• Tried w/o Attention, Attention for sentence
encoding, and attention for both sentence
encoding and document compare
• Code in Github – (a), (b), (c)
| 21
SENTENCE SIMILARITY EXAMPLE
• 2012 Semantic Similarity Task dataset.
• Hierarchical Model (Word to Sentence and
sentence to document).
• Used Matrix Matrix Attention for
comparison
• Code in Github – without attention, with
attention
| 22
SUMMARY
• 4-step recipe is a principled approach to NLP with Deep Learning
• Embed step leverages availability of many pre-trained embeddings.
• Encode step generally uses Bidirectional LSTM to create position sensitive
features, possible to use CNN here also.
• Attention of 3 main types – matrix to vector, with or without implicit context,
matrix and vector to vector, and matrix and matrix to vector. Computes summary
with respect to input or context if provided.
• Predict step converts vector to probability distribution via softmax, usually with a
Fully Connected (Dense) network.
• Interesting pipelines can be composed using complete or partial subsequences of
the 4 step recipe.
| 23
REFERENCES
• Honnibal, M. (2016, November 10). Embed, encode, attend, predict: The new
deep learning formula for state-of-the-art NLP models.
• Liao, R. (2016, December 26). Text Classification, Part 3 – Hierarchical attention
network.
• Leonardblier, P. (2016, January 20). Attention Mechanism
• Raffel, C., & Ellis, D. P. (2015). Feed-forward networks with attention can solve
some long-term memory problems. arXiv preprint arXiv:1512.08756.
• Yang, Z., et al. (2016). Hierarchical attention networks for document classification.
In Proceedings of NAACL-HLT (pp. 1480-1489).
• Cho, K., et al. (2015). Describing multimedia content using attention-based
encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875-1886.
• Parikh, A. P., et al. (2016). A decomposable attention model for natural language
inference. arXiv preprint arXiv:1606.01933.
| 24
THANK YOU
• Code: https://github.com/sujitpal/eeap-examples
• Slides: https://www.slideshare.net/sujitpal/presentation-slides-77511261
• Email: sujit.pal@elsevier.com
• Twitter: @palsujit
50% off on EBook
Discount Code EBDEEP50
Valid till Oct 31 2017

More Related Content

What's hot

ML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot EncodersML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot EncodersMor Krispil
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSujit Pal
 
Netflix recommendation systems
Netflix recommendation systemsNetflix recommendation systems
Netflix recommendation systemsMina Tafreshi
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming KrishnaMildain
 
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)CriatividadeZeroDocs
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfCharles Martin
 
Python Libraries and Modules
Python Libraries and ModulesPython Libraries and Modules
Python Libraries and ModulesRaginiJain21
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Python PPT
Python PPTPython PPT
Python PPTEdureka!
 
Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582 Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582 Rahul Pandey
 
An introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceAn introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceJisc
 
Presentation on python
Presentation on pythonPresentation on python
Presentation on pythonwilliam john
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | EdurekaEdureka!
 

What's hot (20)

ML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot EncodersML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot Encoders
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Chapter 03 python libraries
Chapter 03 python librariesChapter 03 python libraries
Chapter 03 python libraries
 
Netflix recommendation systems
Netflix recommendation systemsNetflix recommendation systems
Netflix recommendation systems
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)
Projetos de algoritmos com implementações em pascal e c (nivio ziviani, 4ed)
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Curso MySQL #01 - Surgimento dos Bancos de Dados
Curso MySQL #01 - Surgimento dos Bancos de DadosCurso MySQL #01 - Surgimento dos Bancos de Dados
Curso MySQL #01 - Surgimento dos Bancos de Dados
 
Google colab introduction
Google colab   introductionGoogle colab   introduction
Google colab introduction
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
 
Python Libraries and Modules
Python Libraries and ModulesPython Libraries and Modules
Python Libraries and Modules
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Python PPT
Python PPTPython PPT
Python PPT
 
Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582 Store Item Demand Forecasting - AIT 582
Store Item Demand Forecasting - AIT 582
 
Banco De Dados
Banco De DadosBanco De Dados
Banco De Dados
 
An introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceAn introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable service
 
Presentation on python
Presentation on pythonPresentation on python
Presentation on python
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | Edureka
 
Pandas
PandasPandas
Pandas
 

Similar to Embed, Encode, Attend, Predict - Applying the 4 Step NLP Recipe

Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...PyData
 
Topic Extraction using Machine Learning
Topic Extraction using Machine LearningTopic Extraction using Machine Learning
Topic Extraction using Machine LearningSanjib Basak
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learningSanjib Basak
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptxadnansbp
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2OSri Ambati
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptxSabthamiS1
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit
 

Similar to Embed, Encode, Attend, Predict - Applying the 4 Step NLP Recipe (20)

Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
 
Topic Extraction using Machine Learning
Topic Extraction using Machine LearningTopic Extraction using Machine Learning
Topic Extraction using Machine Learning
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learning
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
 

More from Sujit Pal

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Sujit Pal
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question AnsweringSujit Pal
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and TestSujit Pal
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Sujit Pal
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringSujit Pal
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop VisualizationSujit Pal
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudSujit Pal
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Sujit Pal
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubSujit Pal
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalSujit Pal
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsSujit Pal
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesSujit Pal
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSSujit Pal
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSujit Pal
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 

More from Sujit Pal (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other Stories
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Embed, Encode, Attend, Predict - Applying the 4 Step NLP Recipe

  • 1. | Presented By Date July 6, 2017 Sujit Pal, Elsevier Labs Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text classification and similarity
  • 4. | 4 AGENDA • NLP Pipelines before Deep Learning • Deconstructing the “Encode, Embed, Attend, Predict” pipeline. • Example #1: Document Classification • Example #2: Document Similarity • Example #3: Sentence Similarity
  • 5. | 5 NLP PIPELINES BEFORE DEEP LEARNING • Document Collection centric • Based on Information Retrieval • Document collection to matrix • Densify using feature reduction • Feed into SVM for classification, etc.
  • 6. | 6 NLP PIPELINES BEFORE DEEP LEARNING • Idea borrowed from Machine Learning (ML) • Represent categorical variables (words) as 1-hot vectors • Represent sentences as matrix of 1-hot word vectors • No distributional semantics at word level.
  • 7. | 7 WORD EMBEDDINGS • Word2Vec – predict word from context (CBOW) or context from word (skip-gram) shown here. • Other embeddings – GloVe, FastText. • Pretrained models available • Encode word “meanings”.
  • 8. | 8 STEP #1: EMBED • Converts from word ID to word vector • Change: replace 1-hot vectors with 3rd party embeddings. • Embeddings encode distributional semantics • Sentence represented as sequence of dense word vectors
  • 9. | 9 STEP #2: ENCODE • Converts sequence of vectors (word vectors) to a matrix (sentence matrix). • Bag of words – concatenate word vectors together. • Each row of sentence matrix encodes the meaning of each word in the context of the sentence. • Generally use LSTM (Long Short Term Memory) or GRU (Gated Recurrent Unit) • Bidirectional processes words left to right and right to left and concatenates.
  • 10. | 10 STEP #3: ATTEND • Reduces matrix (sentence matrix) to a vector (sentence vector) • Non-attention mechanism –Sum or Average/Max Pooling • Attention tells what to keep during reduction to minimize information loss. • Different kinds – matrix, matrix + context (learned), matrix + vector (provided), matrix + matrix.
  • 11. | 11 ATTENTION: MATRIX • Proposed by Raffel, et al • Intuition: select most important element from each timestep • Learnable weights W and b depending on target. • Code on Github
  • 12. | 12 ATTENTION: MATRIX + VECTOR (LEARNED) • Proposed by Lin, et al • Intuition: select most important element from each timestep and weight with another learned vector u. • Code on Github
  • 13. | 13 ATTENTION: MATRIX + VECTOR (PROVIDED) • Proposed by Cho, et al • Intuition: select most important element from each timestep and weight it with a learned multiple of a provided context vector • Code on Github
  • 14. | 14 ATTENTION: MATRIX + MATRIX • Proposed by Parikh, et al • Intuition: build alignment (similarity) matrix by multiplying learned vectors from each matrix, compute context vectors from the alignment matrix, and mix with original signal. • Code on Github
  • 15. | 15 STEP #4: PREDICT • Convert reduced vector to a label. • Generally uses shallow fully connected networks such as the one shown. • Can also be modified to have a regression head (return the probabilities from the softmax activation.
  • 16. | 16 DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #1 • 20 newsgroups dataset • 40k training records • 10k test records • 20 classes • Embed, Predict • Bag of Words idea • Sentence = bag of words • Document = bag of sentences • Code on Github
  • 17. | 17 DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #2 • Embed, Encode, Predict • Hierarchical Encoding • Sentence Encoder: converts sequence of word vectors to sentence vector. • Document Encoder: converts sequence of sentence vectors to document vector. • Sentence encoder Network embedded inside Document network. • Code on Github
  • 18. | 18 DOCUMENT CLASSIFICATION EXAMPLE – ITERATION #3 (a, b, c) • Embed, Encode, Attend, Predict • Encode step returns matrix, vector for each time step. • Attend reduces matrix to vector. • 3 types of attention (all except Matrix Matrix) applied to different versions of model. • Code on Github – (a), (b), (c)
  • 19. | 19 DOCUMENT CLASSIFICATION EXAMPLE – RESULTS
  • 20. | 20 DOCUMENT SIMILARITY EXAMPLE • Data derived from 20 newsgroups • Hierarchical Model (Word to Sentence and sentence to document) • Tried w/o Attention, Attention for sentence encoding, and attention for both sentence encoding and document compare • Code in Github – (a), (b), (c)
  • 21. | 21 SENTENCE SIMILARITY EXAMPLE • 2012 Semantic Similarity Task dataset. • Hierarchical Model (Word to Sentence and sentence to document). • Used Matrix Matrix Attention for comparison • Code in Github – without attention, with attention
  • 22. | 22 SUMMARY • 4-step recipe is a principled approach to NLP with Deep Learning • Embed step leverages availability of many pre-trained embeddings. • Encode step generally uses Bidirectional LSTM to create position sensitive features, possible to use CNN here also. • Attention of 3 main types – matrix to vector, with or without implicit context, matrix and vector to vector, and matrix and matrix to vector. Computes summary with respect to input or context if provided. • Predict step converts vector to probability distribution via softmax, usually with a Fully Connected (Dense) network. • Interesting pipelines can be composed using complete or partial subsequences of the 4 step recipe.
  • 23. | 23 REFERENCES • Honnibal, M. (2016, November 10). Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models. • Liao, R. (2016, December 26). Text Classification, Part 3 – Hierarchical attention network. • Leonardblier, P. (2016, January 20). Attention Mechanism • Raffel, C., & Ellis, D. P. (2015). Feed-forward networks with attention can solve some long-term memory problems. arXiv preprint arXiv:1512.08756. • Yang, Z., et al. (2016). Hierarchical attention networks for document classification. In Proceedings of NAACL-HLT (pp. 1480-1489). • Cho, K., et al. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875-1886. • Parikh, A. P., et al. (2016). A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933.
  • 24. | 24 THANK YOU • Code: https://github.com/sujitpal/eeap-examples • Slides: https://www.slideshare.net/sujitpal/presentation-slides-77511261 • Email: sujit.pal@elsevier.com • Twitter: @palsujit 50% off on EBook Discount Code EBDEEP50 Valid till Oct 31 2017