NLP

Natural Language Processing
Qi Zhang
1

Agenda
• Natural Language Processing Background
• Methods used in NLP
• Applications
• Sentiment Analysis
• Usage in TripAdvisor
• Challenges
2

What is Natural Language Processing?
Text NLP
Structured
Data
Applications
• Machine Reading
3

Methods in NLP
• Automatic Summarization:
• There are basically two types of auctions.
• There are two types of auctions.
• Part-of-speech Tagging: classify and label words
• They refuse to permit us to obtain the refuse permit
• [('They', ‘pronouns'), ('refuse', verb'), ('to', prepositions'), ('permit', verb')…..]
• Entity Extraction:
• People, organizations, locations, times, dates, prices, …
• Relation Extraction:
• Located in, employed by, part of, married to, ...
4

Applications
• Machine Translation: Google Translate
• An electric guitar and bass player stand off…...
• fish as Pacific salmon and striped bass
• Email Spam Filters: Gmail
• Naive Bayes classifier is used to identify spam/ham emails
• P(spam|word) = P(word|spam)*P(spam)/P(word)
• Question-Answering: Amazon’s Alexa , Google Home
• Amazon Lex: AI Api used in Amazon’s Alexa
• Sentiment Analysis: Opinion Mining
5

Sentiment Analysis
• What is it?
• Determine the emotional tone behind a series of words
• Uses
• Political Polling: 2012 Presidential Election
• Business Purpose: TripAdvisor
6

Sentiment Analysis
Problem: How to identify whether a tweet is positive or negative
• Lexical Analysis
• ML Based Approach
7

Lexical Analysis
Input
Tweet
Tokenizer
8
Score: 0

Tokenization
• Input: Friends, Romans, Countrymen, lend me your ears;
• Output: Friends Romans Countrymen lend, me your ears
9

Lexical Analysis
List of
Tokens
Pre-tagged
Dictionary
Word
Matching
Match
?
Increment
Score
Decrement
Score
10
Score ++
Score --

Example
• “Beautiful impressionist paintings and outstanding sculptures. For
me, the original buildings were the best bit! The renovations and
creation of an amazing museum are a work of art in themselves.
Loved the paintings although a bit disappointed with the low number
of Van Gogh.” 😄
• Score: 0.301644
11

Example
beautiful impressionist and, outstanding ….
best ... amazing ...,love,...,disappoint,....
• Pre-Tagged Dictionary
• Positive:[beautiful, wonderful, best, outstanding, amazing, best, love ….]
• Negative: [disappoint, sad, unhappy.....]
• Score: 0.301644
12

Machine Learning Based Approach
Load & Pre-
Process Data
Extract
Features
Train Model
Evaluate
Model
13

ML Based Approach
• Load Data
• 25,000 labeled training tweets
• Another 25, 000 validation tweets
• 50,000 test tweets
14

ML Based Approach
• Pre-Process Data:
• Remove punctuation: “I like this one!!!!!” -> “I like this one”
• Filter out stopwords: “this”, “the”
• Normalize each contiguous occurrence of whitespace to ’ ‘: ” goodd” ->
“goodd”
• Convert to lowercase: “Upper” -> “upper”
• Stemming: “Learning” -> learn”, “Done” -> “do”
• Tokenization
15

ML Based Approach
• Extract Features
• Use Word2Vec model to map each word into an n-dimensional vector
• Each element of the vector can be viewed as a feature
16

What Is Word2Vec Model
• Use:
• Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Given a word, get the similar words
• Advantage:
• Preserve semantic relationship between each word
17

vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”)
18
man
woman
queen
king

• Use: Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Advantage:
• Preserve semantic relationship between each word
• Feature:
• “How Close” words or phrases are to each other
• The angle between the vectors of two words is an indicator of how similar
the words are
19

How To Train A Word2Vec Model?
• Build the model using Genism: Open source python toolkit
• model = Word2Vec(tweets, size=200, window=2, min_count=5, workers=4)
21
The quick brown fox jumps over the lazy dog.

Source Text
22
The quick brown fox jumps over the lazy dog
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick, fox)
(brown, the), (brown, quick),
(brown, fox), (brown, jumps)
(fox, quick), (fox, brown)
(fox, jumps), (fox, over)

Source Text
23
The quick brown rabbit jumps out of the sink
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick,
rabbit)
(brown, the), (brown, quick),
(brown, rabbit), (brown, jumps)
(rabbit, quick), (rabbit, brown)
(rabbit, jumps), (rabbit, out)

For a given word: Rabbit, we get similar surrounding words of same
context:
• Input:
• tweet_w2v.most_similar(’rabbit')
• Output:
• [ (u’fox', 0.7355118989944458), (u’jump', 0.7164269685745239),..]
24

• Input:
• tweet_w2v.most_similar(’good')
• Output:
• [(u'goood', 0.7355118989944458), (u'great', 0.7164269685745239),…]
25

Word2Vec Usage in TripAdvisor
26
User browser seq: Madrid, Lisbon, Barcelona,
Boston
Sentence: “Madrid, Lisbon, Barcelona, Boston”

ML Based Approach
• Train the Model
• Represent each word using Word2Vec
• Combine these word vectors
• Train the classifier
27

ML Based Approach
• Evaluate the Model
• Using the 50,000 test data to assess the model
• Accuracy: 0.78984528240986307
28

Challenges
• Some challenging examples
• “My flight’s been delayed. Brilliant! ☹️ (Sarcasm)
• “I do not dislike cabin cruisers.” (Negation handling)
• Some promising works, but still low accuracy
• Contextualized Sarcasm Detection on Twitter - David Bamman and Noah A.
Smith
29

• Online course:
• https://www.coursera.org/learn/natural-language-processing
• Open resource:
• https://nlp.stanford.edu/ : Standford NLP group
• https://arxiv.org/
30

NLP

Recommended

Recommended

More Related Content

Similar to NLP

Similar to NLP (20)

Recently uploaded

Recently uploaded (20)

NLP

Editor's Notes