Interest in Neural networks is growing with many areas from image recognition to speech processing reporting impressive results. Applications in Natural language processing with Neural networks have found multiple applications. With advances in software and hardware technologies, and interest in AI based applications growing, it is time to understand neural networks applied to natural language processing better!
In this workshop, we will discuss the basics of neural networks and natural language processing and discuss how neural approaches differ from traditional natural language modeling techniques with practical applications.
1. A PRIMER ON NEURAL NETWORK MODELS FOR
NATURAL LANGUAGE PROCESSING
2018 Copyright QuantUniversity LLC.
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
2. 2
QuantUniversity
• Analytics and Fintech Advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data & Fintech
• Programs
▫ Analytics Certificate Program
▫ Fintech Certification program
• Solutions
3. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and Endeca and
25+ financial services and energy customers.
• Regular Columnist for the Wilmott Magazine
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA program and
at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3
4. 4
Code and slides for today’s
workshop:
Request at:
https://tinyurl.com/QUNLP2018
5. 5
• Intro to Natural Language Processing
• Intro to Neural Networks and Deep Neural Networks
• Networks that “understand” language!
• Embeddings: clever representation of words
• Recurrent Neural Networks: remembering history
• Encoder-Decoder architectures
• So many models! So little time! - QuSandbox
In this session
10. 10
• If computers can understand language, opens huge possibilities
▫ Read and summarize
▫ Translate
▫ Describe what’s happening
▫ Understand commands
▫ Answer questions
▫ Respond in plain language
Language allows understanding
11. 11
• Describe rules of grammar
• Describe meanings of words and their
relationships
• …including all the special cases
• ...and idioms
• ...and special cases for the idioms
• ...
• ...understand language!
Traditional language AI
https://en.wikipedia.org/wiki/Formal_language
12. 12
What is NLP ?
Jumping NLP Curves
https://ieeexplore.ieee.org/document/6786458/
14. 14
• Ambiguity:
▫ “ground”
▫ “jaguar”
▫ “The car hit the pole while it was moving”
▫ “One morning I shot an elephant in my pajamas. How he got into my
pajamas, I’ll never know.”
▫ “The tank is full of soldiers.”
“The tank is full of nitrogen.”
Language is hard to deal with
16. 16
• Many ways to say the same thing
▫ “the same thing can be said in many ways”
▫ “language is versatile”
▫ “The same words can be arranged in many different ways to express
the same idea”
▫ …
Language is hard to deal with
17. 17
• Context matters: “I pressed a suit”
Language is hard to deal with
Images: wikipedia and pixabay
18. 18
Why are these funny?
“Time to do my homework #yay”
“It's a small world...
...but I wouldn't want to have to paint it.”
“Time flies like an arrow. Fruit flies like a banana.”
19. 19
• Learn by “reading” lots of text, some labeled.
• Less precise
• Deals with ambiguity better
Neural networks and other statistical approaches
20. 20
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
21. 21
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
35. 35
• MLP:
▫ Work with fixed sized inputs ; Networks learn to combine inputs in
a meaningful way
• CNNs:
▫ Specialized feed-forward architectures that extracts local patterns
in the data
• RNNs:
▫ Takes as input a sequence of items, and produce a fixed size
vector that summarizes that sequence
Key NN architectures for NLP
37. 37
• Can be used with fixed/variable input sizes
• Can be used wherever linear models were used
• Useful in integrating pre-trained word embeddings
MLP in NLP
41. 41
▫ Specialized feed-forward architectures that extracts local patterns
in the data
▫ Fixed/Variable sized inputs
▫ Works well in identifying phrases/idioms
CNNs in NLP
42. 42
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
43. 43
Used to generate representations that are typically used in
conjunction with MLPs
Great for sequences
Addresses many challenges in language modeling (markov
assumptions, sparsity etc.)
RNNs in NLP
44. 44
• Sequence-to-sequence models (Encoder-Decoder) for machine
translation
• Learning from external, unannotated data (Semi-supervised models)
Other NN model applications
45. 45
• Input: posts, labels as positive / negative.
• Goal: build a classifier to classify new posts
• IMDB Dataset: http://ai.stanford.edu/~amaas/data/sentiment/
• 25,000 highly polar movie reviews for training, and 25,000 for
testing.
Sample application: sentiment detection
46. 46
• Goal: get familiar with the problem and establish a simple baseline.
• Overview:
▫ Load the data
▫ Look at a sample of positive and negative reviews
▫ Look at some distributional data
• Code: 08-imdb-explore.ipynb
Demo: IMDB dataset exploration
48. 48
• Can’t learn them all individually…
• Instead, want to have a representation that encodes relationships
between words, so we can learn e.g. that all “negative” words make
it more likely the review is negative.
Challenge: many ways to say same thing
49. 49
• Want computer to understand word relationships
▫ Man : King; Woman : ???
▫ Fish : Ocean; Gazelle : ???
• Goals:
▫ Encode semantic relationship between words: similarity, differences,
etc.
▫ Represent each word in a concise way
Let’s start “simple”: understanding individual words
50. 50
• An embedding is a map word -> vector that makes similar words
have similar vectors, and encodes semantic relationships.
• Creating an embedding:
▫ Look at a lot of text.
“there was a frog in the swamp”
“artificial intelligence has a long way to go”
“whether ’tis nobler in the mind to suffer the slings and arrows of
outrageous fortune”
▫ Learn what words tend to go together, which don’t.
Approach: embeddings
51. 51
• Learn to predict neighbors of a word.
• Compute co-occurrence counts:
• “there was a frog in a swamp”
• P(swamp,frog) = …
• P(artificial,frog) = …
• …
• Train a model word -> vector to minimize d(v1,v2) where P(w1,w2) is
high.
Creating an embedding
56. 56
• Pre-trained embeddings are available:
▫ Google News (100B words)
▫ Twitter (27B words)
▫ Wikipedia + Gigaword (newswire corpus) (6B words)
• It’s better to train/fine-tune for your specific application, but these
are a good place to start
▫ Especially if you don’t have much data
You don’t have to train your own embedding
List from https://github.com/3Top/word2vec-api
57. 57
• Let’s apply the approaches we already know to our movie review
sentiment task
Ok, now we have a reasonable way to represent words
58. 58
• Goal: use familiar network architectures for text classification
• Overview:
▫ Prepare the dataset
▫ Use a pre-trained embedding
▫ Train a MLP
▫ Train a 1D CNN
• Code: 09-imdb-mlp-cnn.ipynb
Demo: MLPs and CNNs for sentiment analysis
60. 60
“In 2009, I went to Nepal”
“I went to Nepal in 2009”
“I had high expectations, and this movie exceeded them.”
• Need to remember what we saw earlier.
• Time series → predict next element
Challenge: the state-time continuum
68. 68
• The same state transformation for each time step
Question: where is the parameter sharing in an RNN?
Hidden
layers
Input 1
Hidden
layers
Input 2
…
Same parameters!
Hidden
layers
Input N
Output
Same parameters!
69. 69
• Again, backpropagation just works!
• In theory…
• Long-term dependencies are a problem
▫ Vanishing gradients
▫ Exploding gradients
• Solutions:
▫ Careful initialization
▫ Short sequences
▫ More advanced techniques, such as LSTM
Training RNNs
70. 70
• As mentioned RNNs have a problem: long-term dependencies
▫ Gradients disappear or blow up
• One solution: LSTM – let network learn when to remember, when to
forget
• Used in practice
LSTM – Long Short-Term Memory networks
77. 77
• Goal: learn to caption images
• Overview:
▫ Learn abstract representations of images using a CNN
▫ Learn to map those abstract representations to sentences
▫ Train the system end-to-end
• Code sketch: 10-image-captioning.ipynb
Demo: captioning images
79. 79
• Code + Environment
• Dynamic scalability
• Enterprise collaboration
• Model Management
• One platform for all your analytical needs
Why QuSandbox?
80. Create Projects
➢ Instructors can create projects using AMIs, DockerHub, Github as resources.
➢ Additional information such as the project type (JNS , Jupyter Lab etc) , description and name can be
specified here.
81. Run Projects
➢ QuSandbox allows users to run a
wide variety of projects hosted
on various platforms such as
AMIs, Docker Hub, Git repos.
➢ While launching the user can
configure specifications like the
project source, the machine
type, duration and the credits
used for this session.
➢ Users are allowed to run more
than 1 project at a time.
82. Launch Labs
On launching the lab users can :
- Modify and run jupyter notebook files, labs and other components linked to the project.
- Explore the project structure, create new files and keep track of work from previous sessions.
83. ➢ Set up account information
username, personal details
and password.
➢ Specify courses that user
wants to registered for .
➢ Multi-role profiles allows
user to register as one or
more roles using the same
account.
Enterprise features – User and Roles
84. Enterprise features – Credential management
Amazon Credentials
- Update aws keys and pem file to grant permission to
use ec2 services for running, stopping , terminating
and extending instances.
Github Credentials
- Update the github username and password to allow
saving project work on github.
* All credentials are securely encrypted and stored in the
database.
85. Admin tools - Manage Tasks
- Running projects can be managed on the Tasks page. Information such as task and instance status, time
remaining as well as past projects information can be viewed here.
- The core project features (LAUNCH, EXTEND, STOP and KILL) can be performed by the designated buttons in
actions field of the task.
86. Academic use case - Courses
Instructors can use the course page to create and edit
lecture components such as slides, reading materials and
quizzes.
Students can view the uploaded material and submit
assignments for the lectures if they are registered for the
respective courses.
87. Command Line Interface on QuSandbox
The Command Line Interface is a unified tool that provides a consistent interface for interacting with all parts of
QuSandbox.
Run a specific project defined by Json file. After completing configuration, an
IP address will be given and user can use the public ip address to run the
project.
PythonJavaScrip
t
88. More Features on CLI
use >Qusandbox -help to get more features’ detail
89. Research Hub on QuSandbox
The research hub on QUSandbox allows group of people working on a project to share and run it seamlessly .
https://researchhub.herokuapp.com/homepage
1. Button linking the project to QUSandbox. 2. View the project on QUSandbox.
90. Research Hub on QuSandbox
The research hub on QUSandbox allows group of people working on a project to share and run it seamlessly.
➢ Each project associated
with a unique
ProjectName.
➢ Create embed link for
each project.
➢ Use the link from
anywhere to hit
QUSandbox.
91. Coming soon!
92
Logistics:
When: June 14,15th
Where: Boston MA
Registration: http://qu-nlp.eventbrite.com/
Code: 25% off all ticket levels
QU25 till 5/4/2018
Code and slides for today’s workshop:
Request at: https://tinyurl.com/QUNLP2018
93. Coming soon!
94
Logistics:
When: June 14,15th
Where: Boston MA
Registration: http://qu-nlp.eventbrite.com/
Code: 25% off all ticket levels
QU25 till 5/4/2018
Code and slides for today’s workshop:
Request at: https://tinyurl.com/QUNLP2018
94. Thank you!
Presentations will be posted here:
www.analyticscertificate.com
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
95