Natural Language Processing

Natural Language Processing
Sandeep Malhotra
techBLEND Group Presentation Series, September 20, 2020

Goal
• Design algorithms to allow computers to understand language in
order to perform some tasks.
• Examples
• Spell checking
• Keyword Search
• Information parsing
• Machine Translation
• Semantic Analysis
• Question Answering
• ….

Main Approaches
• Rule-based methods
• Probabilistic Modelling & Machine Learning
• Deep Learning

What is text?
• A sequence of tokens
• A token can be a character, a word, a phrase etc.

Text Pre-processing
Vectorize
Normalize
Tokenize
Text
[0.4566,0.6879],[0.6789,0.2345],[0.4201,0.3456]
nlp, discuss, interest
NLP, discussion, is, interesting
NLP discussion is interesting

Representing Words
• One hot vector
• Embeddings

One hot vectors
• Word is represented by a vector whose size is equivalent to
vocabulary ( V )
• In the vector, the value at the index of the word is 1, rest is 0
• High dimension
• Sparse vector
• No way to find similarity between words
• Tea and Coffee – no relationship

Word Embeddings
• Word is represented by a vector of size d, where d << V
• Each dimension corresponds to a different attribute of the word
• Low dimension
• Dense vector
• Similar words are closer in vector space
• Tea and Coffee will be similar
• Missing words marked as <UNK>
• Use sub-words

Word Embeddings Types
• Static
• A word will always have the same embedding
• Word ‘python’ in the sentences “He was bitten by a python” and “I know how to code in
python” will have same embedding
• Examples
• WordVec, GloVe
• Contextual
• Embedding will depend on the context
• Word ‘python’ in the sentences “He was bitten by a python” and “I know how to code in
python” will have different embedding
• Examples
• BERT

Hands-on
Fun with Word Embeddings

Feed Forward Neural Networks
“Bad not good” is equivalent to “good not bad”

Recurrent Neural Networks
Remembers context (in practice, only immediate context), suffers from vanishing gradient problem

LSTM – Long Short Term Memory
Ref: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

GRU – Gated Recurring Unit
Ref: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Encoder – Decoder Architecture
Ref: https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129
• The input sentence is fed to the
encoder, and it’s vector
representation is passed to the
first unit in decoder
• The whole input sentence is
represented by a single vector
when passed to decoder, hence
can carry only limited information

Attention Mechanism
Ref: https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129
• Based on how much attention each word in
decoder (output) should pay to each word in
encoder (input)
• Decoder has access to all hidden states of
encoder
• Multiple attentions are possible e.g.
• cosine similarity
• Bahdanau Attention (Additive)
• Luong Attention (Multiplicative)

Transformer Architecture
Ref: https://arxiv.org/abs/1706.03762
• Uses Self-attention i.e how much attention
word pays to other words in the same
sentence
• Encoder processes the words parallelly rather
than sequentially (as is the case with recurrent
networks)
• Position Encoding is used for relative position
of words in the sentence

Natural Language Processing

More Related Content

What's hot

Similar to Natural Language Processing

Recently uploaded

Natural Language Processing