SlideShare a Scribd company logo
1 of 22
Download to read offline
THE WONDERS OF
DEEP LEARNING: HOW
TO LEVERAGE IT FOR
NLP
DATAXDAY 2018
Paris
17/05/2018
DR. ANA PELETEIRO RAMALLO
DATA SCIENCE DIRECTOR
@PeleteiroAna
@TendamRetail
@DataXDay
1880
Founding year
2.000
Physical shops
89
Countries
10.394
Employees
1.299
Own shops
683
Franchises
2
@DataXDay
DEEP LEARNING FOR NLP
Deep learning is having a transformative impact in many areas where machine learning has been
applied.
NLP was somewhat behind other fields in terms of adopting deep learning for applications.
However, this has changed over the last few years, thanks to the use of RNNs, specifically LSTMs,
as well as word embeddings.
Distinct areas in which deep learning can be beneficial for NLP tasks, such as in named entity
recognition, machine translation and language modelling, parsing, chunking, POS tagging,
amongst others.
3
@DataXDay
WORD EMBEDDINGS
4
Representing as ids.
Encodings are arbitrary.
No information about the relationship between words.
Data sparsity.
https://www.tensorflow.org/tutorials/word2vec
Better representation for words.
Words in a continuous vector space where semantically similar words are mapped to nearby points.
Learn dense embedding vectors.
Skip-gram and CBOW
• CBOW predicts target words from the context. E.g., Tendam ?? Talk
• Skip-gram predicts source context-words from the target words. E.g., ?? conference ??
Standard preprocessing step for NLP.
Used also as a feature in supervised approaches (e.g., clustering).
Several parameters we can experiment with, e.g., the size of the word
embedding or the context window.
@DataXDay
CHARACTER EMBEDDINGS
Word embeddings are able to capture syntactic and semantic information.
POS-tagging and NER not enough.
Not the intra-word morphological and shape information, learn sub-token patterns (suffix, prefix), etc.
Out-of-vocabulary word (OOV) issue.
In languages where text is not composed of separated words but individual characters (Chinese).
We can overcome these problems by using character embeddings
5
@DataXDay
CNNs in NLP
CNNs:
effectiveness in
computer vision
tasks
Ability to extract
salient n-gram
features from the
input sentence to
create an
informative latent
semantic
Representa?on of
the sentence for
downstream tasks
Several tasks:
sentence
classifica?on,
summariza?on
6
@DataXDay
RECURRENT
NEURAL NETWORKS
7
@DataXDay
8
Why not basic Deep Nets or CNNs?
@DataXDay
Traditional neural networks and CNNs do not use information from the past,
each entry is independent.
This is fine for several applica=ons, such as classifying images.
However, several applications, such as video, or language modelling, rely on
what has happened in the past to predict the future.
Recurrent Neural Networks (RNN) are capable of conditioning the model on
previous units in the corpus.
Capability of handling inputs of arbitrary length
RNNs
Make use of sequen+al informa+on.
Output is dependent on the previous informa+on.
RNN shares the same parameter W for each step,
so less parameters we need to learn.
9
@DataXDay
h"p://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
10
@DataXDay
http://torch.ch/blog/2016/07/25/nce.html
In theory, RNNs
are absolutely
capable of
handling such
long-term
dependencies.
Practice is ”a
bit” different.
11
RNNs (II)
1.
Parameters are
shared by all
>me steps in
the network,
the gradient at
each output
depends not
only on the
calcula>ons of
the current >me
step, but also
the previous
>me steps.
2.
Exploding
gradients:
3.
Vanishing
gradients:
4.
Easier to spot.
3.1.
Clip the gradient to a
maximum
3.2.
Relus instead of
sigmoid
4.3.
@DataXDay
4.2.
4.1.
Initialization of the
matrix to identity
matrix
Harder to iden>fy
The oversized mannish coats looked positively edible over the bun-
skimming dresses while combined with novelty knitwear such as punk-
like fisherman's sweaters. As other look, the ballet pink Elizabeth and
James jacket provides a cozy cocoon for the 20-year-old to top off her
ensemble of a T-shirt and Parker Smith jeans. But I have to admit that
my favorite is the bun-skimming dresses with the ??
• In theory, RNNs can handle of handling such long-term dependencies.
12
@DataXDay
• However, in reality, they cannot.
• LSTMs and GRUs avoid the long-term dependency problem.
• Remove or add informaEon to the cell state, carefully regulated by
structures called gates.
• Gates are a way to opEonally let informaEon through.
13
@DataXDay
LSTMs
http://cs224d.stanford.edu/lecture_notes/notes4.pdf http://colah.github.io/posts/2015-08-Understanding-LSTMs/
14
@DataXDay
GRUs
h"p://cs224d.stanford.edu/lecture_notes/notes4.pdf
15
@DataXDay
RNN architectures
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
16
@DataXDay
ATTENTION MECHANISM
h"p://www.wildml.com/2016/01/a"en5on-and-memory-in-deep-learning-and-nlp/
h"ps://medium.com/@Synced/a-brief-overview-of-a"en5on-mechanism-13c578ba9129
APPLICATIONS
Word level classifica-on: NER
Sentence classifica-on: tweet sen-ment polarity. Seman-c matching between text
Text classifica-on
Language modelling
Speech recogni-on
Cap-on genera-on
Machine transla-on
Document summariza-on
Ques-on answering
17
EX1: TEXT GENERATION
All text from Shakespeare (4.4MB)
3-layer RNN with 512 hidden nodes on
each layer.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
https://github.com/martin-gorner/tensorflow-rnn-shakespeare
18
@DataXDay
Q&A 19
Pedro del Hierro
SS18
How can I help you today?
I was wondering
what is trending this
spring
This spring is all about new
wave slip, in for example
jumpsuits
Is that appropriate
for a work dinner?
Yes, it totally works! I would
recommend you to use this
chilly oil jumpsuit. You can
combine it with a dark brown
belt and cherry tomato heels.
All from Pedro del Hierro
That sounds great!
@DataXDay
20
PLENTY OF RESOURCES OUT THERE!
• https://distill.pub/2016/misread-tsne/
• http://www.wildml.com
• https://arxiv.org/pdf/1708.02709.pdf
• http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• https://nlp.stanford.edu/courses/NAACL2013/
• http://cs224d.stanford.edu/syllabus.html
• https://github.com/kjw0612/awesome-rnn
• https://lvdmaaten.github.io/tsne/
• https://github.com/oxford-cs-deepnlp-2017
@DataXDay
THANKS!
@PeleteiroAna
21
DataXDay - The wonders of deep learning: how to leverage it for natural language processing

More Related Content

Similar to DataXDay - The wonders of deep learning: how to leverage it for natural language processing

Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Sharmila Sathish
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
Databricks
 

Similar to DataXDay - The wonders of deep learning: how to leverage it for natural language processing (20)

Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
AI and Machine Learning PG program
AI and Machine Learning PG programAI and Machine Learning PG program
AI and Machine Learning PG program
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Data science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridgedData science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridged
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
 
Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Big Data in small words
Big Data in small wordsBig Data in small words
Big Data in small words
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 

More from DataXDay Conference by Xebia

More from DataXDay Conference by Xebia (6)

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning
 
DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis
 
DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at Scale
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker
 

Recently uploaded

Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 

DataXDay - The wonders of deep learning: how to leverage it for natural language processing

  • 1. THE WONDERS OF DEEP LEARNING: HOW TO LEVERAGE IT FOR NLP DATAXDAY 2018 Paris 17/05/2018 DR. ANA PELETEIRO RAMALLO DATA SCIENCE DIRECTOR @PeleteiroAna @TendamRetail @DataXDay
  • 3. DEEP LEARNING FOR NLP Deep learning is having a transformative impact in many areas where machine learning has been applied. NLP was somewhat behind other fields in terms of adopting deep learning for applications. However, this has changed over the last few years, thanks to the use of RNNs, specifically LSTMs, as well as word embeddings. Distinct areas in which deep learning can be beneficial for NLP tasks, such as in named entity recognition, machine translation and language modelling, parsing, chunking, POS tagging, amongst others. 3 @DataXDay
  • 4. WORD EMBEDDINGS 4 Representing as ids. Encodings are arbitrary. No information about the relationship between words. Data sparsity. https://www.tensorflow.org/tutorials/word2vec Better representation for words. Words in a continuous vector space where semantically similar words are mapped to nearby points. Learn dense embedding vectors. Skip-gram and CBOW • CBOW predicts target words from the context. E.g., Tendam ?? Talk • Skip-gram predicts source context-words from the target words. E.g., ?? conference ?? Standard preprocessing step for NLP. Used also as a feature in supervised approaches (e.g., clustering). Several parameters we can experiment with, e.g., the size of the word embedding or the context window. @DataXDay
  • 5. CHARACTER EMBEDDINGS Word embeddings are able to capture syntactic and semantic information. POS-tagging and NER not enough. Not the intra-word morphological and shape information, learn sub-token patterns (suffix, prefix), etc. Out-of-vocabulary word (OOV) issue. In languages where text is not composed of separated words but individual characters (Chinese). We can overcome these problems by using character embeddings 5 @DataXDay
  • 6. CNNs in NLP CNNs: effectiveness in computer vision tasks Ability to extract salient n-gram features from the input sentence to create an informative latent semantic Representa?on of the sentence for downstream tasks Several tasks: sentence classifica?on, summariza?on 6 @DataXDay
  • 8. 8 Why not basic Deep Nets or CNNs? @DataXDay Traditional neural networks and CNNs do not use information from the past, each entry is independent. This is fine for several applica=ons, such as classifying images. However, several applications, such as video, or language modelling, rely on what has happened in the past to predict the future. Recurrent Neural Networks (RNN) are capable of conditioning the model on previous units in the corpus. Capability of handling inputs of arbitrary length
  • 9. RNNs Make use of sequen+al informa+on. Output is dependent on the previous informa+on. RNN shares the same parameter W for each step, so less parameters we need to learn. 9 @DataXDay h"p://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
  • 11. In theory, RNNs are absolutely capable of handling such long-term dependencies. Practice is ”a bit” different. 11 RNNs (II) 1. Parameters are shared by all >me steps in the network, the gradient at each output depends not only on the calcula>ons of the current >me step, but also the previous >me steps. 2. Exploding gradients: 3. Vanishing gradients: 4. Easier to spot. 3.1. Clip the gradient to a maximum 3.2. Relus instead of sigmoid 4.3. @DataXDay 4.2. 4.1. Initialization of the matrix to identity matrix Harder to iden>fy
  • 12. The oversized mannish coats looked positively edible over the bun- skimming dresses while combined with novelty knitwear such as punk- like fisherman's sweaters. As other look, the ballet pink Elizabeth and James jacket provides a cozy cocoon for the 20-year-old to top off her ensemble of a T-shirt and Parker Smith jeans. But I have to admit that my favorite is the bun-skimming dresses with the ?? • In theory, RNNs can handle of handling such long-term dependencies. 12 @DataXDay • However, in reality, they cannot. • LSTMs and GRUs avoid the long-term dependency problem. • Remove or add informaEon to the cell state, carefully regulated by structures called gates. • Gates are a way to opEonally let informaEon through.
  • 17. APPLICATIONS Word level classifica-on: NER Sentence classifica-on: tweet sen-ment polarity. Seman-c matching between text Text classifica-on Language modelling Speech recogni-on Cap-on genera-on Machine transla-on Document summariza-on Ques-on answering 17
  • 18. EX1: TEXT GENERATION All text from Shakespeare (4.4MB) 3-layer RNN with 512 hidden nodes on each layer. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ https://github.com/martin-gorner/tensorflow-rnn-shakespeare 18 @DataXDay
  • 19. Q&A 19 Pedro del Hierro SS18 How can I help you today? I was wondering what is trending this spring This spring is all about new wave slip, in for example jumpsuits Is that appropriate for a work dinner? Yes, it totally works! I would recommend you to use this chilly oil jumpsuit. You can combine it with a dark brown belt and cherry tomato heels. All from Pedro del Hierro That sounds great! @DataXDay
  • 20. 20 PLENTY OF RESOURCES OUT THERE! • https://distill.pub/2016/misread-tsne/ • http://www.wildml.com • https://arxiv.org/pdf/1708.02709.pdf • http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf • http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • https://nlp.stanford.edu/courses/NAACL2013/ • http://cs224d.stanford.edu/syllabus.html • https://github.com/kjw0612/awesome-rnn • https://lvdmaaten.github.io/tsne/ • https://github.com/oxford-cs-deepnlp-2017 @DataXDay