2. Agenda
• Natural Language Processing Background
• Methods used in NLP
• Applications
• Sentiment Analysis
• Usage in TripAdvisor
• Challenges
2
3. What is Natural Language Processing?
Text NLP
Structured
Data
Applications
• Machine Reading
3
4. Methods in NLP
• Automatic Summarization:
• There are basically two types of auctions.
• There are two types of auctions.
• Part-of-speech Tagging: classify and label words
• They refuse to permit us to obtain the refuse permit
• [('They', ‘pronouns'), ('refuse', verb'), ('to', prepositions'), ('permit', verb')…..]
• Entity Extraction:
• People, organizations, locations, times, dates, prices, …
• Relation Extraction:
• Located in, employed by, part of, married to, ...
4
5. Applications
• Machine Translation: Google Translate
• An electric guitar and bass player stand off…...
• fish as Pacific salmon and striped bass
• Email Spam Filters: Gmail
• Naive Bayes classifier is used to identify spam/ham emails
• P(spam|word) = P(word|spam)*P(spam)/P(word)
• Question-Answering: Amazon’s Alexa , Google Home
• Amazon Lex: AI Api used in Amazon’s Alexa
• Sentiment Analysis: Opinion Mining
5
6. Sentiment Analysis
• What is it?
• Determine the emotional tone behind a series of words
• Uses
• Political Polling: 2012 Presidential Election
• Business Purpose: TripAdvisor
6
11. Example
• “Beautiful impressionist paintings and outstanding sculptures. For
me, the original buildings were the best bit! The renovations and
creation of an amazing museum are a work of art in themselves.
Loved the paintings although a bit disappointed with the low number
of Van Gogh.” 😄
• Score: 0.301644
11
12. Example
beautiful impressionist and, outstanding ….
best ... amazing ...,love,...,disappoint,....
• Pre-Tagged Dictionary
• Positive:[beautiful, wonderful, best, outstanding, amazing, best, love ….]
• Negative: [disappoint, sad, unhappy.....]
• Score: 0.301644
12
13. Machine Learning Based Approach
Load & Pre-
Process Data
Extract
Features
Train Model
Evaluate
Model
13
14. ML Based Approach
• Load Data
• 25,000 labeled training tweets
• Another 25, 000 validation tweets
• 50,000 test tweets
14
15. ML Based Approach
• Pre-Process Data:
• Remove punctuation: “I like this one!!!!!” -> “I like this one”
• Filter out stopwords: “this”, “the”
• Normalize each contiguous occurrence of whitespace to ’ ‘: ” goodd” ->
“goodd”
• Convert to lowercase: “Upper” -> “upper”
• Stemming: “Learning” -> learn”, “Done” -> “do”
• Tokenization
15
16. ML Based Approach
• Extract Features
• Use Word2Vec model to map each word into an n-dimensional vector
• Each element of the vector can be viewed as a feature
16
17. What Is Word2Vec Model
• Use:
• Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Given a word, get the similar words
• Advantage:
• Preserve semantic relationship between each word
17
18. What Is Word2Vec Model
vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”)
18
man
woman
queen
king
19. What Is Word2Vec Model
• Use: Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Advantage:
• Preserve semantic relationship between each word
• Feature:
• “How Close” words or phrases are to each other
• The angle between the vectors of two words is an indicator of how similar
the words are
19
21. How To Train A Word2Vec Model?
• Build the model using Genism: Open source python toolkit
• model = Word2Vec(tweets, size=200, window=2, min_count=5, workers=4)
21
The quick brown fox jumps over the lazy dog.
22. How To Train A Word2Vec Model?
Source Text
22
The quick brown fox jumps over the lazy dog
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick, fox)
(brown, the), (brown, quick),
(brown, fox), (brown, jumps)
(fox, quick), (fox, brown)
(fox, jumps), (fox, over)
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
23. How To Train A Word2Vec Model?
Source Text
23
The quick brown rabbit jumps out of the sink
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick,
rabbit)
(brown, the), (brown, quick),
(brown, rabbit), (brown, jumps)
(rabbit, quick), (rabbit, brown)
(rabbit, jumps), (rabbit, out)
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
24. How To Train A Word2Vec Model?
For a given word: Rabbit, we get similar surrounding words of same
context:
• Input:
• tweet_w2v.most_similar(’rabbit')
• Output:
• [ (u’fox', 0.7355118989944458), (u’jump', 0.7164269685745239),..]
24
25. How To Train A Word2Vec Model?
• Input:
• tweet_w2v.most_similar(’good')
• Output:
• [(u'goood', 0.7355118989944458), (u'great', 0.7164269685745239),…]
25
26. Word2Vec Usage in TripAdvisor
26
User browser seq: Madrid, Lisbon, Barcelona,
Boston
Sentence: “Madrid, Lisbon, Barcelona, Boston”
27. ML Based Approach
• Train the Model
• Represent each word using Word2Vec
• Combine these word vectors
• Train the classifier
27
28. ML Based Approach
• Evaluate the Model
• Using the 50,000 test data to assess the model
• Accuracy: 0.78984528240986307
28
29. Challenges
• Some challenging examples
• “My flight’s been delayed. Brilliant! ☹️ (Sarcasm)
• “I do not dislike cabin cruisers.” (Negation handling)
• Some promising works, but still low accuracy
• Contextualized Sarcasm Detection on Twitter - David Bamman and Noah A.
Smith
29
30. • Online course:
• https://www.coursera.org/learn/natural-language-processing
• Open resource:
• https://nlp.stanford.edu/ : Standford NLP group
• https://arxiv.org/
30
where each tweet is labeled 1 when it's positive and 0 when it's negative
Validation tweet are used to tune the model. Prevent overfitting, neural networking is used to train the hidden output layer.
For example, patterns such as “Man is to Woman as King is to Queen” can be generated through algebraic operations on the vector representations of these words such that the vector representation of “Brother” - ”Man” + ”Woman” produces a result which is closest to the vector representation of “Sister” in the model
The vector offset is pretty much parallel to each other
After we have some knowledge to word2vec. Let me continue with how to train a Word2vec model?
The common way is to use Genisum.. Then calling this will build a model for us. Feeding this model by a large corpus of sentences, which is used to build a vocabulary.
The size is the word vector dimension.
min_count = ignore all words with total frequency lower than this.wordkers: use this many worker threads to train the model: thread. Because the text corpus are really large, so I set the thread to be 4.
The window is the maximum distance between the current and predicted word within a sentence.
If we set the window size = 2, and dimension to be 200? How it works? Let me demonstrate this with only 1 input sentence:
Size is size is the dimensionality of the feature vectors.
Window: window is the maximum distance between the current and predicted word within a sentence.
Given a specific word in the middle of a sentence (the input word), look at the words nearby.
The output probabilities are going to relate to how likely it is find each vocabulary word nearby our input word.
For example, if you gave the trained network the input word “Soviet”, the output probabilities are going to be much higher for words like “Union” and “Russia” than for unrelated words like “watermelon” and “kangaroo”.
min_count = ignore all words with total frequency lower than this.wordkers: use this many worker threads to train the model: thread
Tripadvisor recommendation use word2Vec model.
For example, a user’s brwoser sequence is “ Madrid./…..” which means, this user actually search/browser Madird, then Boston.....
so we can make up a sentence by the user’s browser sequence; The sentece we will use to feed the word2vec model is: Madrid, Lisbon,…” Like we do for The quick brown fox jumps over the lazy dog.
after feeding many such sentences from different users, it learns pretty well how geos are similar in meaning! Then after I booked a vacational rentals in Boston, it will also recommend other places in Spain.
It is hard for people
Sarcasm is dependent on its context
They think the the relationship between author and audience is central for understanding the sarcasm phenomenon. Promising work: looks at attributes of the author (author features), attributes of the intended recipient of a tweet (audience features), and the attributes of responses to potentially sarcastic tweets (response features).
use of grammatical relations among words to model a sentence, and hence to determine words that are affected by negation.
static window and punctuation marks to determine the scope of negation.
Using natural language processing to detect sarcasm on the internet still has a long way to go and may never be particularly reliable