Nlp and Neural Networks workshop

A PRIMER ON NEURAL NETWORK MODELS FOR
NATURAL LANGUAGE PROCESSING
2018 Copyright QuantUniversity LLC.
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com

2
QuantUniversity
• Analytics and Fintech Advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data & Fintech
• Programs
▫ Analytics Certificate Program
▫ Fintech Certification program
• Solutions

• Founder of QuantUniversity LLC. and
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and Endeca and
25+ financial services and energy customers.
• Regular Columnist for the Wilmott Magazine
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA program and
at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3

4
Code and slides for today’s
workshop:
Request at:
https://tinyurl.com/QUNLP2018

5
• Intro to Natural Language Processing
• Intro to Neural Networks and Deep Neural Networks
• Networks that “understand” language!
• Embeddings: clever representation of words
• Recurrent Neural Networks: remembering history
• Encoder-Decoder architectures
• So many models! So little time! - QuSandbox
In this session

7
What is NLP ?
AI
Linguistics
Computer
Science

8
• Q/A
• Dialog systems - Chatbots
• Topic summarization
• Sentiment analysis
• Classification
• Keyword extraction - Search
• Information extraction – Prices, Dates, People etc.
• Tone Analysis
• Machine Translation
• Document comparison – Similar/Dissimilar
Sample applications

10
• If computers can understand language, opens huge possibilities
▫ Read and summarize
▫ Translate
▫ Describe what’s happening
▫ Understand commands
▫ Answer questions
▫ Respond in plain language
Language allows understanding

11
• Describe rules of grammar
• Describe meanings of words and their
relationships
• …including all the special cases
• ...and idioms
• ...and special cases for the idioms
• ...
• ...understand language!
Traditional language AI
https://en.wikipedia.org/wiki/Formal_language

12
What is NLP ?
Jumping NLP Curves
https://ieeexplore.ieee.org/document/6786458/

13
Q: What’s hard about writing programs
to understand text?

14
• Ambiguity:
▫ “ground”
▫ “jaguar”
▫ “The car hit the pole while it was moving”
▫ “One morning I shot an elephant in my pajamas. How he got into my
pajamas, I’ll never know.”
▫ “The tank is full of soldiers.”
“The tank is full of nitrogen.”
Language is hard to deal with

16
• Many ways to say the same thing
▫ “the same thing can be said in many ways”
▫ “language is versatile”
▫ “The same words can be arranged in many different ways to express
the same idea”
▫ …

17
• Context matters: “I pressed a suit”
Images: wikipedia and pixabay

18
Why are these funny?
“Time to do my homework #yay”
“It's a small world...
...but I wouldn't want to have to paint it.”
“Time flies like an arrow. Fruit flies like a banana.”

19
• Learn by “reading” lots of text, some labeled.
• Less precise
• Deals with ambiguity better
Neural networks and other statistical approaches

20
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1

21
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y

22
Start with labeled pairs (Xi, Yi)
( ,“kitten”),( ,“puppy”)
…

23
Success: predict new examples
( ,?)

24
https://commons.wikimedia.org/wiki/Neural_network
“kitten”
“puppy”
“has fur?”
“pointy ears?”
“dangerously cute?”
Neural Networks

25
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
Linear regression
1
Weighted sum

26
Linear regression
1
Learning = “find good weights”

27
Binary linear classifier
1
To classify: Y > 0?

28
Binary linear classifier
1
Bias weight

31
Data
http://www.theneweconomy.com/strategy/big-data-is-not-without-its-problems

32
New Approaches
http://deeplearning.net/reading-list/

33
Given (lots of) data, DNNs learn a good representation
automatically.

34
http://www.asimovinstitute.org/neural-network-zoo/

35
• MLP:
▫ Work with fixed sized inputs ; Networks learn to combine inputs in
a meaningful way
• CNNs:
▫ Specialized feed-forward architectures that extracts local patterns
in the data
• RNNs:
▫ Takes as input a sequence of items, and produce a fixed size
vector that summarizes that sequence
Key NN architectures for NLP

37
• Can be used with fixed/variable input sizes
• Can be used wherever linear models were used
• Useful in integrating pre-trained word embeddings
MLP in NLP

38
Convolutional Neural Networks
Convolution
Specialized feed-forward architectures that excel at extracting local
patterns in the data

40
Convolutional Neural Networks
easily integrate pre-trained word embeddings

41
▫ Specialized feed-forward architectures that extracts local patterns
in the data
▫ Fixed/Variable sized inputs
▫ Works well in identifying phrases/idioms
CNNs in NLP

42
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

43
Used to generate representations that are typically used in
conjunction with MLPs
Great for sequences
Addresses many challenges in language modeling (markov
assumptions, sparsity etc.)
RNNs in NLP

44
• Sequence-to-sequence models (Encoder-Decoder) for machine
translation
• Learning from external, unannotated data (Semi-supervised models)
Other NN model applications

45
• Input: posts, labels as positive / negative.
• Goal: build a classifier to classify new posts
• IMDB Dataset: http://ai.stanford.edu/~amaas/data/sentiment/
• 25,000 highly polar movie reviews for training, and 25,000 for
testing.
Sample application: sentiment detection

46
• Goal: get familiar with the problem and establish a simple baseline.
• Overview:
▫ Load the data
▫ Look at a sample of positive and negative reviews
▫ Look at some distributional data
• Code: 08-imdb-explore.ipynb
Demo: IMDB dataset exploration

48
• Can’t learn them all individually…
• Instead, want to have a representation that encodes relationships
between words, so we can learn e.g. that all “negative” words make
it more likely the review is negative.
Challenge: many ways to say same thing

49
• Want computer to understand word relationships
▫ Man : King; Woman : ???
▫ Fish : Ocean; Gazelle : ???
• Goals:
▫ Encode semantic relationship between words: similarity, differences,
etc.
▫ Represent each word in a concise way
Let’s start “simple”: understanding individual words

50
• An embedding is a map word -> vector that makes similar words
have similar vectors, and encodes semantic relationships.
• Creating an embedding:
▫ Look at a lot of text.
 “there was a frog in the swamp”
 “artificial intelligence has a long way to go”
 “whether ’tis nobler in the mind to suffer the slings and arrows of
outrageous fortune”
▫ Learn what words tend to go together, which don’t.
Approach: embeddings

51
• Learn to predict neighbors of a word.
• Compute co-occurrence counts:
• “there was a frog in a swamp”
• P(swamp,frog) = …
• P(artificial,frog) = …
• …
• Train a model word -> vector to minimize d(v1,v2) where P(w1,w2) is
high.
Creating an embedding

52
Frog:
Swamp:
Computer:
…
Compute error in predicting P(w1,w2) given d(v1,v2).
Update weights:
Frog:
Swamp:
Computer:
Creating an embedding
[0.2, 0.7, 0.11, …, 0.52]
[0.9, 0.55, 0.4, …, 0.8]
[0.3, 0.6, 0.01, …, 0.7]
[0.3, 0.65, 0.3, …, 0.6]
[0.7, 0.6, 0.4, …, 0.7]
[0.5, 0.3, 0.02, …, 0.4]
1)
2)
3)

53
http://multithreaded.stitchfix.com/assets/images/blog/vectors.gif
Embeddings capture conceptual relationships

54
http://nlp.yvespeirsman.be/blog/visualizing-word-embeddings-with-tsne/

55
http://nlp.yvespeirsman.be/blog/visualizing-word-embeddings-with-tsne/

56
• Pre-trained embeddings are available:
▫ Google News (100B words)
▫ Twitter (27B words)
▫ Wikipedia + Gigaword (newswire corpus) (6B words)
• It’s better to train/fine-tune for your specific application, but these
are a good place to start
▫ Especially if you don’t have much data
You don’t have to train your own embedding
List from https://github.com/3Top/word2vec-api

57
• Let’s apply the approaches we already know to our movie review
sentiment task
Ok, now we have a reasonable way to represent words

58
• Goal: use familiar network architectures for text classification
• Overview:
▫ Prepare the dataset
▫ Use a pre-trained embedding
▫ Train a MLP
▫ Train a 1D CNN
• Code: 09-imdb-mlp-cnn.ipynb
Demo: MLPs and CNNs for sentiment analysis

60
“In 2009, I went to Nepal”
“I went to Nepal in 2009”
“I had high expectations, and this movie exceeded them.”
• Need to remember what we saw earlier.
• Time series → predict next element
Challenge: the state-time continuum

61
Solution: let the network represent the past

62
Our networks so far
Hidden
layers
Input
Output

63
Recurrent Neural Networks (RNNs)
Hidden
layers
Input
Recurrent connection
Output

64
Another view of RNNs
Hidden
layers
Input 1
Output
Hidden
layers
Input N
Output
…
This
Recurrent
connection
Recurrent
connection
Recurrent
connection
movie monkeys…
Hidden
layers
Input 2
Output

65
Variant: one output
Hidden
layers
Input 1
Hidden
layers
Input 2
…
This
Recurrent
connection
Recurrent
connection
Recurrent
connection
movie monkeys…
Hidden
layers
Input N
Output

66
New parameters:
Hidden
layers
Input
Output
Hidden-to-hidden weights
Input-to-hidden weights
Hidden-to-output weights

67
New parameters:
Hidden
layers
Input
Output
Hidden-to-hidden weights
Input-to-hidden weights
Hidden-to-output weightsHow to combine two arrows
leading to hidden state?
Add contribution of input +
previous hidden state

68
• The same state transformation for each time step
Question: where is the parameter sharing in an RNN?
Hidden
layers
Input 1
Hidden
layers
Input 2
…
Same parameters!
Hidden
layers
Input N
Output
Same parameters!

69
• Again, backpropagation just works!
• In theory…
• Long-term dependencies are a problem
▫ Vanishing gradients
▫ Exploding gradients
• Solutions:
▫ Careful initialization
▫ Short sequences
▫ More advanced techniques, such as LSTM
Training RNNs

70
• As mentioned RNNs have a problem: long-term dependencies
▫ Gradients disappear or blow up
• One solution: LSTM – let network learn when to remember, when to
forget
• Used in practice
LSTM – Long Short-Term Memory networks

71
Demo: simple RNN for text generation

72
• https://github.com/fchollet/keras/blob/master/examples/imdb_lst
m.py
Demo: RNN for sentiment classification

74
• Translate (seq2seq)
• Caption (vec2seq)
• Visualize or classify text (seq2vec)
What if input + output have different length, or type?

75
Encoder-decoder architecture
Hidden
layers
Input 1
Hidden
layers
Input 2
Hidden
layers
Input N
…
Hidden
layers
Output 1
Hidden
layers
Output 2
Hidden
layers
Output M
…
Encoding
“Thought vector”

76
Encoder-decoder variant: vec2seq
Hidden
layers
Input 1
Hidden
layers
Hidden
layers
…
Hidden
layers
Output 1
Hidden
layers
Output 2
Hidden
layers
Output M
…
Encoding
“Thought vector”

77
• Goal: learn to caption images
• Overview:
▫ Learn abstract representations of images using a CNN
▫ Learn to map those abstract representations to sentences
▫ Train the system end-to-end
• Code sketch: 10-image-captioning.ipynb
Demo: captioning images

79
• Code + Environment
• Dynamic scalability
• Enterprise collaboration
• Model Management
• One platform for all your analytical needs
Why QuSandbox?

Create Projects
➢ Instructors can create projects using AMIs, DockerHub, Github as resources.
➢ Additional information such as the project type (JNS , Jupyter Lab etc) , description and name can be
specified here.

Run Projects
➢ QuSandbox allows users to run a
wide variety of projects hosted
on various platforms such as
AMIs, Docker Hub, Git repos.
➢ While launching the user can
configure specifications like the
project source, the machine
type, duration and the credits
used for this session.
➢ Users are allowed to run more
than 1 project at a time.

Launch Labs
On launching the lab users can :
- Modify and run jupyter notebook files, labs and other components linked to the project.
- Explore the project structure, create new files and keep track of work from previous sessions.

➢ Set up account information
username, personal details
and password.
➢ Specify courses that user
wants to registered for .
➢ Multi-role profiles allows
user to register as one or
more roles using the same
account.
Enterprise features – User and Roles

Enterprise features – Credential management
Amazon Credentials
- Update aws keys and pem file to grant permission to
use ec2 services for running, stopping , terminating
and extending instances.
Github Credentials
- Update the github username and password to allow
saving project work on github.
* All credentials are securely encrypted and stored in the
database.

Admin tools - Manage Tasks
- Running projects can be managed on the Tasks page. Information such as task and instance status, time
remaining as well as past projects information can be viewed here.
- The core project features (LAUNCH, EXTEND, STOP and KILL) can be performed by the designated buttons in
actions field of the task.

Academic use case - Courses
Instructors can use the course page to create and edit
lecture components such as slides, reading materials and
quizzes.
Students can view the uploaded material and submit
assignments for the lectures if they are registered for the
respective courses.

Command Line Interface on QuSandbox
The Command Line Interface is a unified tool that provides a consistent interface for interacting with all parts of
QuSandbox.
Run a specific project defined by Json file. After completing configuration, an
IP address will be given and user can use the public ip address to run the
project.
PythonJavaScrip
t

More Features on CLI
use >Qusandbox -help to get more features’ detail

Research Hub on QuSandbox
The research hub on QUSandbox allows group of people working on a project to share and run it seamlessly .
https://researchhub.herokuapp.com/homepage
1. Button linking the project to QUSandbox. 2. View the project on QUSandbox.

Research Hub on QuSandbox
The research hub on QUSandbox allows group of people working on a project to share and run it seamlessly.
➢ Each project associated
with a unique
ProjectName.
➢ Create embed link for
each project.
➢ Use the link from
anywhere to hit
QUSandbox.

Coming soon!
92
Logistics:
When: June 14,15th
Where: Boston MA
Registration: http://qu-nlp.eventbrite.com/
Code: 25% off all ticket levels
QU25 till 5/4/2018
Code and slides for today’s workshop:
Request at: https://tinyurl.com/QUNLP2018

Coming soon!
94
Logistics:
When: June 14,15th
Where: Boston MA
Registration: http://qu-nlp.eventbrite.com/
Code: 25% off all ticket levels
QU25 till 5/4/2018
Code and slides for today’s workshop:
Request at: https://tinyurl.com/QUNLP2018

Thank you!
Presentations will be posted here:
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
95

Nlp and Neural Networks workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nlp and Neural Networks workshop

Similar to Nlp and Neural Networks workshop (20)

More from QuantUniversity

More from QuantUniversity (20)

Recently uploaded

Recently uploaded (20)

Nlp and Neural Networks workshop