50 Shades of Text - Leveraging Natural
Language Processing
Alessandro Panebianco
Agenda
• About Me

• Natural Language Processing

• Vectorization Techniques

• Word Embeddings

• Sentence Embeddings

• Demo

• Lessons Learned
2
• Computer Engineering 

• Data Science Consultancy

• E-commerce 

• Energy&Utilities
About me
3
email: ale.panebianco@me.com
Natural Language Processing
Language is the method of human communication, either
spoken or written, consisting of the use of words in a
structured and conventional way
4
Natural Language Processing
The goal of Natural Language Processing is for
computers to achieve human-like comprehension of
texts/languages
5
Natural Language Processing
Why?
https://youtu.be/lXUQ-DdSDoE?t=81
6
Natural Language Processing
Applications
• Machine translation (Google Translate)

• Natural language generation (Reddit bot)

• Sentiment analysis (Cambridge Analytica)

• Lexical semantics (Thesaurus)

• Web and application search (Amazon)

• Question answering (chatbots)

…. and many others
7
How do we enable machines to
interpret language?
Transforming raw text into numerical features
8
Vector
Hashing
Trick
Bag of
words TF-IDF Word2Vec GloVe FastText
Vectorization Techniques
Bag of words
How to go from words to vectors?

without music life would be a mistake Radiohead are great band
S1 1 1 1 1 1 1 1 0 0 0 0
S2 0 1 0 0 0 1 0 1 1 1 1
S1: Without music life would be a mistake
S2: Radiohead are a great music band
9
๏ Dictionary size
๏ Sparsity
๏ Word order absence
✓ Easy to implement
✓ Fast
Vectorization Techniques (II)
Hashing Trick
๏ Hash is one-way

๏ Same output for different inputs
10
✓ Same input -> Same output

✓ Range is always fixed (vector size)
Term Frequency - Inverse Document Frequency
Weight rare words higher than common words
Vectorization Techniques (III)
TF-IDF
without music life would be a mistake Radiohead are great band
S1 0.3 0 0.3 0.3 0.3 0 0.3 0 0 0 0
S2 0 0 0 0 0 0 0 0.3 0.3 0.3 0.3
11
๏ Dictionary size
๏ Sparsity
๏ Word order absence
✓ Easy to implement
✓ Fast
✓ Weight words
Word Embeddings
Word2Vec
• The goal of word embeddings is to generate vectors
encoding semantics
12
• Word2Vec does it maximizing the cosine similarity between
two randomly initialized vectors
Context Window
Without music life would be a mistake
Word Embeddings
Word2Vec (II)
Queen
Woman
King
Man
13
• Analogies

• Synonyms

• Syntactic-Semantic vectors 

• Speech tagging

• Named entity recognition
King - Man + Woman = Queen
Word Embeddings
GloVe
• It differs from word2vec for being a count based model
instead of predictive

• Dimensionality reduction on the co-occurrence counts
matrix

• It uses cosine similarity
14
Word Embeddings
FastText
FastText : Word Embeddings = XGBoost : Random Forest
15
Sentence Embeddings
What If we want to represent more than a single word?

Many techniques have been utilized:

• Common aggregation operations (avg, sum,
concatenation, etc.)

• Doc2Vec

• Neural Networks (CNN,LSTM,etc.)
16
Sentence Embeddings (II)
Doc2Vec
Every paragraph is mapped to a unique vector

The paragraph token can be thought of as another word. It
acts as a memory that remembers what is missing from the
current context — or the topic of the paragraph
17
Sentence Embeddings (III)
CNN
• Stacking words together create a Matrix (image)

• Filters act like word scans (i.e. misspellings)

• Max Pooling would highlight the most important words
(i.e. what is the item of a query) 

• The LSTM layer keeps the word order
18
Sentence Embeddings (IV)
LSTM
• RNNs resemble how we process language (i.e. Google
searches)

• The LSTM layer generates a new encoding for the original
input giving relevance to the word order
(return_sequences=True) 

• The convolution layer filters the most important local
features (i.e. what is the item of a query)
19
Demo
Training Data:

https://www.kaggle.com/c/home-depot-product-search-
relevance/data

GloVe vectors:

http://nlp.stanford.edu/data/glove.6B.zip
20
Lessons Learned
• NLP is one of the most mature research fields in the AI
space 

• Make your own word embeddings using an ad-hoc
vocabulary

• With a large corpus, try FastText

• With short texts (i.e. user queries) experiment higher text
granularity (n-grams, characters)

• Explore sentence embeddings through Neural Networks
21
Questions?

50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro Panebianco

  • 1.
    50 Shades ofText - Leveraging Natural Language Processing Alessandro Panebianco
  • 2.
    Agenda • About Me •Natural Language Processing • Vectorization Techniques • Word Embeddings • Sentence Embeddings • Demo • Lessons Learned 2
  • 3.
    • Computer Engineering • Data Science Consultancy • E-commerce • Energy&Utilities About me 3 email: ale.panebianco@me.com
  • 4.
    Natural Language Processing Languageis the method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way 4
  • 5.
    Natural Language Processing Thegoal of Natural Language Processing is for computers to achieve human-like comprehension of texts/languages 5
  • 6.
  • 7.
    Natural Language Processing Applications •Machine translation (Google Translate) • Natural language generation (Reddit bot) • Sentiment analysis (Cambridge Analytica) • Lexical semantics (Thesaurus) • Web and application search (Amazon) • Question answering (chatbots) …. and many others 7
  • 8.
    How do weenable machines to interpret language? Transforming raw text into numerical features 8 Vector Hashing Trick Bag of words TF-IDF Word2Vec GloVe FastText
  • 9.
    Vectorization Techniques Bag ofwords How to go from words to vectors? without music life would be a mistake Radiohead are great band S1 1 1 1 1 1 1 1 0 0 0 0 S2 0 1 0 0 0 1 0 1 1 1 1 S1: Without music life would be a mistake S2: Radiohead are a great music band 9 ๏ Dictionary size ๏ Sparsity ๏ Word order absence ✓ Easy to implement ✓ Fast
  • 10.
    Vectorization Techniques (II) HashingTrick ๏ Hash is one-way ๏ Same output for different inputs 10 ✓ Same input -> Same output ✓ Range is always fixed (vector size)
  • 11.
    Term Frequency -Inverse Document Frequency Weight rare words higher than common words Vectorization Techniques (III) TF-IDF without music life would be a mistake Radiohead are great band S1 0.3 0 0.3 0.3 0.3 0 0.3 0 0 0 0 S2 0 0 0 0 0 0 0 0.3 0.3 0.3 0.3 11 ๏ Dictionary size ๏ Sparsity ๏ Word order absence ✓ Easy to implement ✓ Fast ✓ Weight words
  • 12.
    Word Embeddings Word2Vec • Thegoal of word embeddings is to generate vectors encoding semantics 12 • Word2Vec does it maximizing the cosine similarity between two randomly initialized vectors Context Window Without music life would be a mistake
  • 13.
    Word Embeddings Word2Vec (II) Queen Woman King Man 13 •Analogies • Synonyms • Syntactic-Semantic vectors • Speech tagging • Named entity recognition King - Man + Woman = Queen
  • 14.
    Word Embeddings GloVe • Itdiffers from word2vec for being a count based model instead of predictive • Dimensionality reduction on the co-occurrence counts matrix • It uses cosine similarity 14
  • 15.
    Word Embeddings FastText FastText :Word Embeddings = XGBoost : Random Forest 15
  • 16.
    Sentence Embeddings What Ifwe want to represent more than a single word? Many techniques have been utilized: • Common aggregation operations (avg, sum, concatenation, etc.) • Doc2Vec • Neural Networks (CNN,LSTM,etc.) 16
  • 17.
    Sentence Embeddings (II) Doc2Vec Everyparagraph is mapped to a unique vector The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context — or the topic of the paragraph 17
  • 18.
    Sentence Embeddings (III) CNN •Stacking words together create a Matrix (image) • Filters act like word scans (i.e. misspellings) • Max Pooling would highlight the most important words (i.e. what is the item of a query) • The LSTM layer keeps the word order 18
  • 19.
    Sentence Embeddings (IV) LSTM •RNNs resemble how we process language (i.e. Google searches) • The LSTM layer generates a new encoding for the original input giving relevance to the word order (return_sequences=True) • The convolution layer filters the most important local features (i.e. what is the item of a query) 19
  • 20.
  • 21.
    Lessons Learned • NLPis one of the most mature research fields in the AI space • Make your own word embeddings using an ad-hoc vocabulary • With a large corpus, try FastText • With short texts (i.e. user queries) experiment higher text granularity (n-grams, characters) • Explore sentence embeddings through Neural Networks 21
  • 22.