SlideShare a Scribd company logo
1 of 36
Convolution Neural Nets for
Language Modeling
Anuj Gupta
Lead Data Scientist, FreshWorks
@anujgupta82
anujgupta82@gmail.com
• Background
• CNN
• Language modeling
• Intuition behind this fusion
• Deep dive
• Key take home
Agenda
3
Building Blocks
4
• Introduced by Yann LeCun in 1998*
• Have been super successful in the area of vision. Almost become bread and butter for computer
vision problems.
• CNN treats image as a signal in spatial domain.
• Have many nice properties that makes super useful :
• Spatially invariant – translation, rotation
• Local structure
• Fast (concurrent calculations in each layer)
Convolutional Neural Nets (CNN)
* LeNet-5 in "Gradient-based learning applied to document recognition" 5
Basics of CNN
• Input : Image
• Image is nothing but a signal in space.
• Represented by matrix with values (RGB)
• Each value ~ wavelength of Red, Green and Blue signals respectively.
• 2 Key operations are : Convolution & Pooling
6
• In simplest terms : given 2 signals x() and h(), convolution combines the
2 signals:
• In the discrete space:
• For our case image is x()
• h() is called filter/kernel/feature detector. Well known concept in the world
of image processing.
Convolution
7
• Ex: Filters for edge detection,
blurring, sharpen, etc
• It is usually a small matrix -
3x3, 5x5, 5x7 etc
• There are well known
predefined filters
https://en.wikipedia.org/wiki/Kernel_(image_processing)
8
1 0 1
0 1 0
1 0 1
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
4
1*1 + 1*0 + 1*1
0*0 + 1*1 + 1*0
0*1 + 0*0 + 1*1
• Convolved feature is nothing but taking a part of the image and applying
filter over it - taking pairwise products and adding them.
9
• Convolved feature map is nothing but sliding the filter over entire image
and applying convolution at each step, as shown in diagram below:
1 0 1
0 1 0
1 0 1
https://stats.stackexchange.com/questions/154798/difference-between-kernel-and-filter-in-cnn
Filter
10
• Image processing over past many decades has
built many filters for specific tasks.
• In DL (CNN) rather than using predefined filters,
we learn the filters.
• We start with small random values and update
them using gradients
? ? ?
? ? ?
? ? ?
11
• It’s a simple technique for down/sub sampling.
• In CNNs, down sampling, or "pooling" layers are often placed after
convolutional layers.
• They are used mainly to reduce the feature map dimensionality for
computational efficiency. This in turn improves actual performance.
• Takes disjoint chunks of the image (typically 2×22×2) and aggregates
them into a single value.
• Average, max, min, etc. Most popular is max-pooling.
Pooling
https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html
12
Putting it all together
https://adeshpande3.github.io
13
Language Modeling
• Filter out good sentences from bad ones.
• Good = semantically and syntactically correct.
• Model it via probability distribution over sequences of words Pr (w1, w2, ….., wn)
• Assign a probability to a sentence such that
S1 = “the cat jumped over the dog”, Pr(S1) ~ 1
S2 = “jumped over the the cat dog”, Pr(S2) ~ 0
14
Language Modeling
• Machine Translation:
• P(high winds tonite) > P(large winds tonite)
• Spell Correction :
• The office is about fifteen minuets from my house.
• P(about fifteen minutes from) > P(about fifteen minuets from)
• Speech Recognition :
• P(I saw a van) >> P(eyes awe of an)
• Summarization, question – answering, etc., etc.
15
• Unary Language Models: Assumes each word occurs completely independent
• Overly simplistic !
• Binary Language Models: A word in a sentence is influenced by its immediate
predecessor (a.k.a Bigram setting)
• This too is naïve but goes long way in understanding some key concepts.
16
• N-gram models: try to capture long term dependencies.
Pr (w1, w2, ….., wn) = 𝑖=1
𝑛
Pr (wi| w1, w2, ….., wi-1)
• This captures how likely is a sentence in a given language.
17
Deep Learning + Language Modeling
• Traditionally uses architecture such as Recurrent Neural Networks (RNN).
• Sequential processing : one unit after other.
• Over time advancements happened and concepts like : 2 way ordering
(Bidirectional), memory(LSTM), attention etc got added.
• Some people explored the possibility of using CNN for Language modeling:
• Pixels spread in space. So they are nothing but signal in space.
• Words/tokens/characters spread in time. So they are nothing but signal in time.
18
CNNs for Language Modeling
19
• Input for any NLP task are sentences/paras/docs in the form of matrix
• Each row of this matrix represents a unit/token of text – character, morpheme,
word etc (typically row = 1-hot or embedding representation of that unit)
• Unlike images, where filter slides over local patches of an image; in NLP we
typically use filters that slide over full rows of the matrix i.e. the “width” of our
filters is usually the same as the width of the input matrix. [1D or temporal
convolutions]
• The height, or region size varies. Typically, window slides over 2-5 words at a
time.
20
21
• Lots of success of CNNs is attributed to :
• Location Invariance : where a object in a image comes doesn’t matter so much
• Local Compositionality : bunch of local objects combine/compose to give more complex
objects.
22
• In CNN+NLP, both aforementioned properties go for a toss
• Where a word comes in a sentence can change the meaning drastically.
Man bites dog.
Dog bites man.
• Parts of phrases could be separated by several other words. Words do compose in some ways,
but how exactly this works, what higher level representations actually “mean” – these aren’t as
obvious as in the Computer Vision case.
“Tim said Robert has lot of experience, he feels you should definitely meet him”
• Both key advantages gone, why are we even thinking of applying CNNs to text ?
RNNs should be the way to go.
23
• “All models are wrong, but some are useful”
• This is not about CNNs vs RNNs (may be both are bad!)
• This is about
• Understanding key difficulties
• Are there some aspects of language modeling where CNNs can do a better job.
• Helps us to better understand strength & weakness of each model.
• Turns out that CNNs applied to certain NLP problems perform quite well. Esp
classification tasks - Sentiment Analysis, Spam Detection or Topic
Categorization.
• CNNs are usually fast, very fast.
24
Major works in this sub-area
• Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. EMNLP 2014
• Santos, C. N. dos, & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment
Analysis of Short Texts. COLING-2014
• Shen, Y., He, X., Gao, J., Deng, L., & Mesnil, G. (2014). A Latent Semantic Model with
Convolutional-Pooling Structure for Information Retrieval. CIKM ’14.
• Santos, C., & Zadrozny, B. (2014). Learning Character-level Representations for Part-of-Speech
Tagging. ICML-14.
• Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text
Classification, 1–9.
• Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: attention-based
convolutional neural network for modeling sentence pairs.
25
• Ngoc Thang Vu, Heike Adel, Pankaj Gupta, and Hinrich Schutze. 2016. Combining recurrent
and convolutional neural networks for relation classification. In Proceedings of NAACL HLT.
pages 534–539.
• Ying Wen, Weinan Zhang, Rui Luo, and Jun Wang.2016. Learning text representation using
recurrent convolutional neural network with highway layers. SIGIR Workshop on Neural
Information Retrieval
• Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with
gated convolutional networks. arXiv preprint arXiv:1612.08083
• Wenpeng Yin, Katharina Kann, Mo Yu and Hinrich Schutze Comparative Study of CNN and
RNN for Natural Language Processing
• Kim, Y., Jernite, Y., Sontag, D., & Rush, A. M. (2015). Character-Aware Neural Language
Models. (Uses a hybrid of CNN and RNN)
26
Deep Dive
27
Character-Aware Neural Language Models *
• Problem statement: Given t words w1, w2, ….., wt ; predict wt+1
• Traditional models : words fed as inputs in the form of word embedding.
• Here input embedding is replaced by output of character level CNN.
• Uses sub word information.
• Traditionally sub word information is fed in terms of morphemes;
Unbreakable : Un ("not") – break (root word) – able (“can be done”)
* “Character-Aware Neural Language Models” Y kim et. al 2015 28
• Identifying morphemes is non trivial. Requires morphological tagging as
preprocessing.
• Y Kim et. al leverage sub word via through a character-level CNN.
• Learn embedding for each character.
• A word w is then nothing but embeddings of it constituent characters.
• For each word, we apply convolution on its character embeddings to obtain features.
• These are then fed to LSTM via highway layers.
• Does not use word embeddings at all.
• In most language models, large % of parameters are because of word
embeddings. Thus, we get much smaller number of parameter to learn.
29
Details
C - vocabulary of characters.
D - dimensionality of character embeddings.
R - matrix character embeddings.
Let word wk = [c1,....,cl] i.e. made from l characters, where l
is length of wk
Character-level representation of wk is given by matrix
Ck ∈ ℝ D X l, where jth column corresponds to character
embedding for jth character of word wk
Apply filter/kernel H to Ck to obtain feature map fk.
ith element of fk is given by:
is not : ith to (i-w+1)th columns of Ck
is called Frobenius product
|C|
D R
l
D Ck
c1 c2 cl
l - w +1
fk 30
• To capture most important feature - we take max over time
yk is the feature corresponding to filter H when applied to word wk.
(~ find most important character n-gram)
• Likewise, they apply multiple h filters : H1, …., Hh.
• Then, yk = is the input representation of word wk.
At this point of time we can either:
• Construct MLP over yk
• Feed yk to LSTM
31
Instead to gain improvements, rather than feeding yk to LSTM, they pass it via Highway
network*
Highway network:
Basic idea: carry some part input directly to output.
While remaining input is processed and then taken forward.
Very similar to residual networks.
F() is typically : affine transformation followed by tanh.
In Highway networks, we learn “what parts of input to be
carried forward via highway”
This is done via gating mechanism called transform gate (t) and carry gate (1-t)
32
33
In nutshell
Results
34
Key take home
• CNNs + NLP surely holds lot of promise.
• Pretty successful in classification setting.
• Can prove be great tool to model the input aspects of NLP.
• What about non-classification settings ?
• Sequence labeling (NER)
• Sequence generation (MT)
• As of today not so successful
• Though people have tried lot of ideas there too.
• de-convolutions in generative settings
• Some architectures use different embeddings as different channels. 35
More Resources
• https://devblogs.nvidia.com/parallelforall/understanding-natural-language-deep-neural-networks-using-torch/
• https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f
• wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
• https://blogs.technet.microsoft.com/machinelearning/2017/02/13/cloud-scale-text-classification-with-
convolutional-neural-networks-on-microsoft-azure/
• https://www.aclweb.org/anthology/P/P14/P14-1062.xhtml
• https://github.com/yoonkim/lstm-char-cnn
• https://github.com/yoonkim/CNN_sentence
• https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32
• “Comparative Study of CNN and RNN for Natural Language Processing” Wenpeng Yin et. al 2017,
arXiv:1702.01923 [cs.CL]
36
Thanks
Questions ?
37
@anujgupta82
anujgupta82@gmail.com

More Related Content

What's hot

Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep LearningAdam Gibson
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 

What's hot (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Language models
Language modelsLanguage models
Language models
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 

Similar to Talk from NVidia Developer Connect

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Short story presentation
Short story presentationShort story presentation
Short story presentationStutiAgarwal36
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 

Similar to Talk from NVidia Developer Connect (20)

Image captioning
Image captioningImage captioning
Image captioning
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
 
Spectral convnets
Spectral convnetsSpectral convnets
Spectral convnets
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Short story presentation
Short story presentationShort story presentation
Short story presentation
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Attention
AttentionAttention
Attention
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 

More from Anuj Gupta

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsAnuj Gupta
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesAnuj Gupta
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisAnuj Gupta
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsAnuj Gupta
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLPAnuj Gupta
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 

More from Anuj Gupta (7)

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakes
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural Nets
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLP
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Talk from NVidia Developer Connect

  • 1. Convolution Neural Nets for Language Modeling Anuj Gupta Lead Data Scientist, FreshWorks @anujgupta82 anujgupta82@gmail.com
  • 2. • Background • CNN • Language modeling • Intuition behind this fusion • Deep dive • Key take home Agenda 3
  • 4. • Introduced by Yann LeCun in 1998* • Have been super successful in the area of vision. Almost become bread and butter for computer vision problems. • CNN treats image as a signal in spatial domain. • Have many nice properties that makes super useful : • Spatially invariant – translation, rotation • Local structure • Fast (concurrent calculations in each layer) Convolutional Neural Nets (CNN) * LeNet-5 in "Gradient-based learning applied to document recognition" 5
  • 5. Basics of CNN • Input : Image • Image is nothing but a signal in space. • Represented by matrix with values (RGB) • Each value ~ wavelength of Red, Green and Blue signals respectively. • 2 Key operations are : Convolution & Pooling 6
  • 6. • In simplest terms : given 2 signals x() and h(), convolution combines the 2 signals: • In the discrete space: • For our case image is x() • h() is called filter/kernel/feature detector. Well known concept in the world of image processing. Convolution 7
  • 7. • Ex: Filters for edge detection, blurring, sharpen, etc • It is usually a small matrix - 3x3, 5x5, 5x7 etc • There are well known predefined filters https://en.wikipedia.org/wiki/Kernel_(image_processing) 8
  • 8. 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 4 1*1 + 1*0 + 1*1 0*0 + 1*1 + 1*0 0*1 + 0*0 + 1*1 • Convolved feature is nothing but taking a part of the image and applying filter over it - taking pairwise products and adding them. 9
  • 9. • Convolved feature map is nothing but sliding the filter over entire image and applying convolution at each step, as shown in diagram below: 1 0 1 0 1 0 1 0 1 https://stats.stackexchange.com/questions/154798/difference-between-kernel-and-filter-in-cnn Filter 10
  • 10. • Image processing over past many decades has built many filters for specific tasks. • In DL (CNN) rather than using predefined filters, we learn the filters. • We start with small random values and update them using gradients ? ? ? ? ? ? ? ? ? 11
  • 11. • It’s a simple technique for down/sub sampling. • In CNNs, down sampling, or "pooling" layers are often placed after convolutional layers. • They are used mainly to reduce the feature map dimensionality for computational efficiency. This in turn improves actual performance. • Takes disjoint chunks of the image (typically 2×22×2) and aggregates them into a single value. • Average, max, min, etc. Most popular is max-pooling. Pooling https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html 12
  • 12. Putting it all together https://adeshpande3.github.io 13
  • 13. Language Modeling • Filter out good sentences from bad ones. • Good = semantically and syntactically correct. • Model it via probability distribution over sequences of words Pr (w1, w2, ….., wn) • Assign a probability to a sentence such that S1 = “the cat jumped over the dog”, Pr(S1) ~ 1 S2 = “jumped over the the cat dog”, Pr(S2) ~ 0 14
  • 14. Language Modeling • Machine Translation: • P(high winds tonite) > P(large winds tonite) • Spell Correction : • The office is about fifteen minuets from my house. • P(about fifteen minutes from) > P(about fifteen minuets from) • Speech Recognition : • P(I saw a van) >> P(eyes awe of an) • Summarization, question – answering, etc., etc. 15
  • 15. • Unary Language Models: Assumes each word occurs completely independent • Overly simplistic ! • Binary Language Models: A word in a sentence is influenced by its immediate predecessor (a.k.a Bigram setting) • This too is naïve but goes long way in understanding some key concepts. 16
  • 16. • N-gram models: try to capture long term dependencies. Pr (w1, w2, ….., wn) = 𝑖=1 𝑛 Pr (wi| w1, w2, ….., wi-1) • This captures how likely is a sentence in a given language. 17
  • 17. Deep Learning + Language Modeling • Traditionally uses architecture such as Recurrent Neural Networks (RNN). • Sequential processing : one unit after other. • Over time advancements happened and concepts like : 2 way ordering (Bidirectional), memory(LSTM), attention etc got added. • Some people explored the possibility of using CNN for Language modeling: • Pixels spread in space. So they are nothing but signal in space. • Words/tokens/characters spread in time. So they are nothing but signal in time. 18
  • 18. CNNs for Language Modeling 19
  • 19. • Input for any NLP task are sentences/paras/docs in the form of matrix • Each row of this matrix represents a unit/token of text – character, morpheme, word etc (typically row = 1-hot or embedding representation of that unit) • Unlike images, where filter slides over local patches of an image; in NLP we typically use filters that slide over full rows of the matrix i.e. the “width” of our filters is usually the same as the width of the input matrix. [1D or temporal convolutions] • The height, or region size varies. Typically, window slides over 2-5 words at a time. 20
  • 20. 21
  • 21. • Lots of success of CNNs is attributed to : • Location Invariance : where a object in a image comes doesn’t matter so much • Local Compositionality : bunch of local objects combine/compose to give more complex objects. 22
  • 22. • In CNN+NLP, both aforementioned properties go for a toss • Where a word comes in a sentence can change the meaning drastically. Man bites dog. Dog bites man. • Parts of phrases could be separated by several other words. Words do compose in some ways, but how exactly this works, what higher level representations actually “mean” – these aren’t as obvious as in the Computer Vision case. “Tim said Robert has lot of experience, he feels you should definitely meet him” • Both key advantages gone, why are we even thinking of applying CNNs to text ? RNNs should be the way to go. 23
  • 23. • “All models are wrong, but some are useful” • This is not about CNNs vs RNNs (may be both are bad!) • This is about • Understanding key difficulties • Are there some aspects of language modeling where CNNs can do a better job. • Helps us to better understand strength & weakness of each model. • Turns out that CNNs applied to certain NLP problems perform quite well. Esp classification tasks - Sentiment Analysis, Spam Detection or Topic Categorization. • CNNs are usually fast, very fast. 24
  • 24. Major works in this sub-area • Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. EMNLP 2014 • Santos, C. N. dos, & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. COLING-2014 • Shen, Y., He, X., Gao, J., Deng, L., & Mesnil, G. (2014). A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. CIKM ’14. • Santos, C., & Zadrozny, B. (2014). Learning Character-level Representations for Part-of-Speech Tagging. ICML-14. • Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification, 1–9. • Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: attention-based convolutional neural network for modeling sentence pairs. 25
  • 25. • Ngoc Thang Vu, Heike Adel, Pankaj Gupta, and Hinrich Schutze. 2016. Combining recurrent and convolutional neural networks for relation classification. In Proceedings of NAACL HLT. pages 534–539. • Ying Wen, Weinan Zhang, Rui Luo, and Jun Wang.2016. Learning text representation using recurrent convolutional neural network with highway layers. SIGIR Workshop on Neural Information Retrieval • Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with gated convolutional networks. arXiv preprint arXiv:1612.08083 • Wenpeng Yin, Katharina Kann, Mo Yu and Hinrich Schutze Comparative Study of CNN and RNN for Natural Language Processing • Kim, Y., Jernite, Y., Sontag, D., & Rush, A. M. (2015). Character-Aware Neural Language Models. (Uses a hybrid of CNN and RNN) 26
  • 27. Character-Aware Neural Language Models * • Problem statement: Given t words w1, w2, ….., wt ; predict wt+1 • Traditional models : words fed as inputs in the form of word embedding. • Here input embedding is replaced by output of character level CNN. • Uses sub word information. • Traditionally sub word information is fed in terms of morphemes; Unbreakable : Un ("not") – break (root word) – able (“can be done”) * “Character-Aware Neural Language Models” Y kim et. al 2015 28
  • 28. • Identifying morphemes is non trivial. Requires morphological tagging as preprocessing. • Y Kim et. al leverage sub word via through a character-level CNN. • Learn embedding for each character. • A word w is then nothing but embeddings of it constituent characters. • For each word, we apply convolution on its character embeddings to obtain features. • These are then fed to LSTM via highway layers. • Does not use word embeddings at all. • In most language models, large % of parameters are because of word embeddings. Thus, we get much smaller number of parameter to learn. 29
  • 29. Details C - vocabulary of characters. D - dimensionality of character embeddings. R - matrix character embeddings. Let word wk = [c1,....,cl] i.e. made from l characters, where l is length of wk Character-level representation of wk is given by matrix Ck ∈ ℝ D X l, where jth column corresponds to character embedding for jth character of word wk Apply filter/kernel H to Ck to obtain feature map fk. ith element of fk is given by: is not : ith to (i-w+1)th columns of Ck is called Frobenius product |C| D R l D Ck c1 c2 cl l - w +1 fk 30
  • 30. • To capture most important feature - we take max over time yk is the feature corresponding to filter H when applied to word wk. (~ find most important character n-gram) • Likewise, they apply multiple h filters : H1, …., Hh. • Then, yk = is the input representation of word wk. At this point of time we can either: • Construct MLP over yk • Feed yk to LSTM 31
  • 31. Instead to gain improvements, rather than feeding yk to LSTM, they pass it via Highway network* Highway network: Basic idea: carry some part input directly to output. While remaining input is processed and then taken forward. Very similar to residual networks. F() is typically : affine transformation followed by tanh. In Highway networks, we learn “what parts of input to be carried forward via highway” This is done via gating mechanism called transform gate (t) and carry gate (1-t) 32
  • 34. Key take home • CNNs + NLP surely holds lot of promise. • Pretty successful in classification setting. • Can prove be great tool to model the input aspects of NLP. • What about non-classification settings ? • Sequence labeling (NER) • Sequence generation (MT) • As of today not so successful • Though people have tried lot of ideas there too. • de-convolutions in generative settings • Some architectures use different embeddings as different channels. 35
  • 35. More Resources • https://devblogs.nvidia.com/parallelforall/understanding-natural-language-deep-neural-networks-using-torch/ • https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f • wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ • https://blogs.technet.microsoft.com/machinelearning/2017/02/13/cloud-scale-text-classification-with- convolutional-neural-networks-on-microsoft-azure/ • https://www.aclweb.org/anthology/P/P14/P14-1062.xhtml • https://github.com/yoonkim/lstm-char-cnn • https://github.com/yoonkim/CNN_sentence • https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32 • “Comparative Study of CNN and RNN for Natural Language Processing” Wenpeng Yin et. al 2017, arXiv:1702.01923 [cs.CL] 36