SlideShare a Scribd company logo
Natural Language Processing
Qi Zhang
1
Agenda
• Natural Language Processing Background
• Methods used in NLP
• Applications
• Sentiment Analysis
• Usage in TripAdvisor
• Challenges
2
What is Natural Language Processing?
Text NLP
Structured
Data
Applications
• Machine Reading
3
Methods in NLP
• Automatic Summarization:
• There are basically two types of auctions.
• There are two types of auctions.
• Part-of-speech Tagging: classify and label words
• They refuse to permit us to obtain the refuse permit
• [('They', ‘pronouns'), ('refuse', verb'), ('to', prepositions'), ('permit', verb')…..]
• Entity Extraction:
• People, organizations, locations, times, dates, prices, …
• Relation Extraction:
• Located in, employed by, part of, married to, ...
4
Applications
• Machine Translation: Google Translate
• An electric guitar and bass player stand off…...
• fish as Pacific salmon and striped bass
• Email Spam Filters: Gmail
• Naive Bayes classifier is used to identify spam/ham emails
• P(spam|word) = P(word|spam)*P(spam)/P(word)
• Question-Answering: Amazon’s Alexa , Google Home
• Amazon Lex: AI Api used in Amazon’s Alexa
• Sentiment Analysis: Opinion Mining
5
Sentiment Analysis
• What is it?
• Determine the emotional tone behind a series of words
• Uses
• Political Polling: 2012 Presidential Election
• Business Purpose: TripAdvisor
6
Sentiment Analysis
Problem: How to identify whether a tweet is positive or negative
• Lexical Analysis
• ML Based Approach
7
Lexical Analysis
Input
Tweet
Tokenizer
8
Score: 0
Tokenization
• Input: Friends, Romans, Countrymen, lend me your ears;
• Output: Friends Romans Countrymen lend, me your ears
9
Lexical Analysis
List of
Tokens
Pre-tagged
Dictionary
Word
Matching
Match
?
Increment
Score
Decrement
Score
10
Score ++
Score --
Example
• “Beautiful impressionist paintings and outstanding sculptures. For
me, the original buildings were the best bit! The renovations and
creation of an amazing museum are a work of art in themselves.
Loved the paintings although a bit disappointed with the low number
of Van Gogh.” 😄
• Score: 0.301644
11
Example
beautiful impressionist and, outstanding ….
best ... amazing ...,love,...,disappoint,....
• Pre-Tagged Dictionary
• Positive:[beautiful, wonderful, best, outstanding, amazing, best, love ….]
• Negative: [disappoint, sad, unhappy.....]
• Score: 0.301644
12
Machine Learning Based Approach
Load & Pre-
Process Data
Extract
Features
Train Model
Evaluate
Model
13
ML Based Approach
• Load Data
• 25,000 labeled training tweets
• Another 25, 000 validation tweets
• 50,000 test tweets
14
ML Based Approach
• Pre-Process Data:
• Remove punctuation: “I like this one!!!!!” -> “I like this one”
• Filter out stopwords: “this”, “the”
• Normalize each contiguous occurrence of whitespace to ’ ‘: ” goodd” ->
“goodd”
• Convert to lowercase: “Upper” -> “upper”
• Stemming: “Learning” -> learn”, “Done” -> “do”
• Tokenization
15
ML Based Approach
• Extract Features
• Use Word2Vec model to map each word into an n-dimensional vector
• Each element of the vector can be viewed as a feature
16
What Is Word2Vec Model
• Use:
• Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Given a word, get the similar words
• Advantage:
• Preserve semantic relationship between each word
17
What Is Word2Vec Model
vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”)
18
man
woman
queen
king
What Is Word2Vec Model
• Use: Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Advantage:
• Preserve semantic relationship between each word
• Feature:
• “How Close” words or phrases are to each other
• The angle between the vectors of two words is an indicator of how similar
the words are
19
20
How To Train A Word2Vec Model?
• Build the model using Genism: Open source python toolkit
• model = Word2Vec(tweets, size=200, window=2, min_count=5, workers=4)
21
The quick brown fox jumps over the lazy dog.
How To Train A Word2Vec Model?
Source Text
22
The quick brown fox jumps over the lazy dog
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick, fox)
(brown, the), (brown, quick),
(brown, fox), (brown, jumps)
(fox, quick), (fox, brown)
(fox, jumps), (fox, over)
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
How To Train A Word2Vec Model?
Source Text
23
The quick brown rabbit jumps out of the sink
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick,
rabbit)
(brown, the), (brown, quick),
(brown, rabbit), (brown, jumps)
(rabbit, quick), (rabbit, brown)
(rabbit, jumps), (rabbit, out)
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
How To Train A Word2Vec Model?
For a given word: Rabbit, we get similar surrounding words of same
context:
• Input:
• tweet_w2v.most_similar(’rabbit')
• Output:
• [ (u’fox', 0.7355118989944458), (u’jump', 0.7164269685745239),..]
24
How To Train A Word2Vec Model?
• Input:
• tweet_w2v.most_similar(’good')
• Output:
• [(u'goood', 0.7355118989944458), (u'great', 0.7164269685745239),…]
25
Word2Vec Usage in TripAdvisor
26
User browser seq: Madrid, Lisbon, Barcelona,
Boston
Sentence: “Madrid, Lisbon, Barcelona, Boston”
ML Based Approach
• Train the Model
• Represent each word using Word2Vec
• Combine these word vectors
• Train the classifier
27
ML Based Approach
• Evaluate the Model
• Using the 50,000 test data to assess the model
• Accuracy: 0.78984528240986307
28
Challenges
• Some challenging examples
• “My flight’s been delayed. Brilliant! ☹️ (Sarcasm)
• “I do not dislike cabin cruisers.” (Negation handling)
• Some promising works, but still low accuracy
• Contextualized Sarcasm Detection on Twitter - David Bamman and Noah A.
Smith
29
• Online course:
• https://www.coursera.org/learn/natural-language-processing
• Open resource:
• https://nlp.stanford.edu/ : Standford NLP group
• https://arxiv.org/
30
Thank you!
31

More Related Content

Similar to NLP

Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
Pierre Schaus
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?
Ehren Reilly
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applications
Tejas Patil
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
huguk
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
PyCon Philippines 2012 Keynote
PyCon Philippines 2012 KeynotePyCon Philippines 2012 Keynote
PyCon Philippines 2012 Keynote
Daniel Greenfeld
 
bp
bpbp
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
petermurrayrust
 
An Introduction To Python - Variables, Math
An Introduction To Python - Variables, MathAn Introduction To Python - Variables, Math
An Introduction To Python - Variables, Math
Blue Elephant Consulting
 
Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011
Jesse Warden
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
Tech in Asia ID
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02
Jeffrey Clark
 
Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree Learning
Adrian Cuyugan
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
chalachew5
 
Boost Maintainability
Boost MaintainabilityBoost Maintainability
Boost Maintainability
Mosky Liu
 
TRank ISWC2013
TRank ISWC2013TRank ISWC2013
TRank ISWC2013
eXascale Infolab
 
SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.
Splunk
 
AstriCon 2017 - Machine Learning, AI & Asterisk
AstriCon 2017  - Machine Learning, AI & AsteriskAstriCon 2017  - Machine Learning, AI & Asterisk
AstriCon 2017 - Machine Learning, AI & Asterisk
Evan McGee
 

Similar to NLP (20)

Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applications
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
PyCon Philippines 2012 Keynote
PyCon Philippines 2012 KeynotePyCon Philippines 2012 Keynote
PyCon Philippines 2012 Keynote
 
bp
bpbp
bp
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
An Introduction To Python - Variables, Math
An Introduction To Python - Variables, MathAn Introduction To Python - Variables, Math
An Introduction To Python - Variables, Math
 
Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011Refactoring RIA Unleashed 2011
Refactoring RIA Unleashed 2011
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02
 
Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree Learning
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
Boost Maintainability
Boost MaintainabilityBoost Maintainability
Boost Maintainability
 
TRank ISWC2013
TRank ISWC2013TRank ISWC2013
TRank ISWC2013
 
SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.
 
AstriCon 2017 - Machine Learning, AI & Asterisk
AstriCon 2017  - Machine Learning, AI & AsteriskAstriCon 2017  - Machine Learning, AI & Asterisk
AstriCon 2017 - Machine Learning, AI & Asterisk
 

Recently uploaded

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
MuhammadJazib15
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
Beckhoff Programmable Logic Control Overview Presentation
Beckhoff Programmable Logic Control Overview PresentationBeckhoff Programmable Logic Control Overview Presentation
Beckhoff Programmable Logic Control Overview Presentation
VanTuDuong1
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
mahaffeycheryld
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
nikshimanasa
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 

Recently uploaded (20)

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
Beckhoff Programmable Logic Control Overview Presentation
Beckhoff Programmable Logic Control Overview PresentationBeckhoff Programmable Logic Control Overview Presentation
Beckhoff Programmable Logic Control Overview Presentation
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 

NLP

  • 2. Agenda • Natural Language Processing Background • Methods used in NLP • Applications • Sentiment Analysis • Usage in TripAdvisor • Challenges 2
  • 3. What is Natural Language Processing? Text NLP Structured Data Applications • Machine Reading 3
  • 4. Methods in NLP • Automatic Summarization: • There are basically two types of auctions. • There are two types of auctions. • Part-of-speech Tagging: classify and label words • They refuse to permit us to obtain the refuse permit • [('They', ‘pronouns'), ('refuse', verb'), ('to', prepositions'), ('permit', verb')…..] • Entity Extraction: • People, organizations, locations, times, dates, prices, … • Relation Extraction: • Located in, employed by, part of, married to, ... 4
  • 5. Applications • Machine Translation: Google Translate • An electric guitar and bass player stand off…... • fish as Pacific salmon and striped bass • Email Spam Filters: Gmail • Naive Bayes classifier is used to identify spam/ham emails • P(spam|word) = P(word|spam)*P(spam)/P(word) • Question-Answering: Amazon’s Alexa , Google Home • Amazon Lex: AI Api used in Amazon’s Alexa • Sentiment Analysis: Opinion Mining 5
  • 6. Sentiment Analysis • What is it? • Determine the emotional tone behind a series of words • Uses • Political Polling: 2012 Presidential Election • Business Purpose: TripAdvisor 6
  • 7. Sentiment Analysis Problem: How to identify whether a tweet is positive or negative • Lexical Analysis • ML Based Approach 7
  • 9. Tokenization • Input: Friends, Romans, Countrymen, lend me your ears; • Output: Friends Romans Countrymen lend, me your ears 9
  • 11. Example • “Beautiful impressionist paintings and outstanding sculptures. For me, the original buildings were the best bit! The renovations and creation of an amazing museum are a work of art in themselves. Loved the paintings although a bit disappointed with the low number of Van Gogh.” 😄 • Score: 0.301644 11
  • 12. Example beautiful impressionist and, outstanding …. best ... amazing ...,love,...,disappoint,.... • Pre-Tagged Dictionary • Positive:[beautiful, wonderful, best, outstanding, amazing, best, love ….] • Negative: [disappoint, sad, unhappy.....] • Score: 0.301644 12
  • 13. Machine Learning Based Approach Load & Pre- Process Data Extract Features Train Model Evaluate Model 13
  • 14. ML Based Approach • Load Data • 25,000 labeled training tweets • Another 25, 000 validation tweets • 50,000 test tweets 14
  • 15. ML Based Approach • Pre-Process Data: • Remove punctuation: “I like this one!!!!!” -> “I like this one” • Filter out stopwords: “this”, “the” • Normalize each contiguous occurrence of whitespace to ’ ‘: ” goodd” -> “goodd” • Convert to lowercase: “Upper” -> “upper” • Stemming: “Learning” -> learn”, “Done” -> “do” • Tokenization 15
  • 16. ML Based Approach • Extract Features • Use Word2Vec model to map each word into an n-dimensional vector • Each element of the vector can be viewed as a feature 16
  • 17. What Is Word2Vec Model • Use: • Map the word into high dimensional ( > 100) vector • Input: a large corpus of text • Output: vector spaces: w=(w1,w2…..wn) • Given a word, get the similar words • Advantage: • Preserve semantic relationship between each word 17
  • 18. What Is Word2Vec Model vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”) 18 man woman queen king
  • 19. What Is Word2Vec Model • Use: Map the word into high dimensional ( > 100) vector • Input: a large corpus of text • Output: vector spaces: w=(w1,w2…..wn) • Advantage: • Preserve semantic relationship between each word • Feature: • “How Close” words or phrases are to each other • The angle between the vectors of two words is an indicator of how similar the words are 19
  • 20. 20
  • 21. How To Train A Word2Vec Model? • Build the model using Genism: Open source python toolkit • model = Word2Vec(tweets, size=200, window=2, min_count=5, workers=4) 21 The quick brown fox jumps over the lazy dog.
  • 22. How To Train A Word2Vec Model? Source Text 22 The quick brown fox jumps over the lazy dog Training Samples ( the, quick), (the, brown) (quick, the), (quick, brown), (quick, fox) (brown, the), (brown, quick), (brown, fox), (brown, jumps) (fox, quick), (fox, brown) (fox, jumps), (fox, over) The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog
  • 23. How To Train A Word2Vec Model? Source Text 23 The quick brown rabbit jumps out of the sink Training Samples ( the, quick), (the, brown) (quick, the), (quick, brown), (quick, rabbit) (brown, the), (brown, quick), (brown, rabbit), (brown, jumps) (rabbit, quick), (rabbit, brown) (rabbit, jumps), (rabbit, out) The quick brown rabbit jumps out of the sink The quick brown rabbit jumps out of the sink The quick brown rabbit jumps out of the sink
  • 24. How To Train A Word2Vec Model? For a given word: Rabbit, we get similar surrounding words of same context: • Input: • tweet_w2v.most_similar(’rabbit') • Output: • [ (u’fox', 0.7355118989944458), (u’jump', 0.7164269685745239),..] 24
  • 25. How To Train A Word2Vec Model? • Input: • tweet_w2v.most_similar(’good') • Output: • [(u'goood', 0.7355118989944458), (u'great', 0.7164269685745239),…] 25
  • 26. Word2Vec Usage in TripAdvisor 26 User browser seq: Madrid, Lisbon, Barcelona, Boston Sentence: “Madrid, Lisbon, Barcelona, Boston”
  • 27. ML Based Approach • Train the Model • Represent each word using Word2Vec • Combine these word vectors • Train the classifier 27
  • 28. ML Based Approach • Evaluate the Model • Using the 50,000 test data to assess the model • Accuracy: 0.78984528240986307 28
  • 29. Challenges • Some challenging examples • “My flight’s been delayed. Brilliant! ☹️ (Sarcasm) • “I do not dislike cabin cruisers.” (Negation handling) • Some promising works, but still low accuracy • Contextualized Sarcasm Detection on Twitter - David Bamman and Noah A. Smith 29
  • 30. • Online course: • https://www.coursera.org/learn/natural-language-processing • Open resource: • https://nlp.stanford.edu/ : Standford NLP group • https://arxiv.org/ 30

Editor's Notes

  1. This is a example of tokenization
  2. where each tweet is labeled 1 when it's positive and 0 when it's negative Validation tweet are used to tune the model. Prevent overfitting, neural networking is used to train the hidden output layer.
  3. For example, patterns such as “Man is to Woman as King is to Queen” can be generated through algebraic operations on the vector representations of these words such that the vector representation of “Brother” - ”Man” + ”Woman” produces a result which is closest to the vector representation of “Sister” in the model The vector offset is pretty much parallel to each other
  4. After we have some knowledge to word2vec. Let me continue with how to train a Word2vec model? The common way is to use Genisum.. Then calling this will build a model for us. Feeding this model by a large corpus of sentences, which is used to build a vocabulary. The size is the word vector dimension. min_count = ignore all words with total frequency lower than this. wordkers: use this many worker threads to train the model: thread. Because the text corpus are really large, so I set the thread to be 4. The window is the maximum distance between the current and predicted word within a sentence. If we set the window size = 2, and dimension to be 200? How it works? Let me demonstrate this with only 1 input sentence: Size is size is the dimensionality of the feature vectors. Window: window is the maximum distance between the current and predicted word within a sentence. Given a specific word in the middle of a sentence (the input word), look at the words nearby. The output probabilities are going to relate to how likely it is find each vocabulary word nearby our input word. For example, if you gave the trained network the input word “Soviet”, the output probabilities are going to be much higher for words like “Union” and “Russia” than for unrelated words like “watermelon” and “kangaroo”. min_count = ignore all words with total frequency lower than this. wordkers: use this many worker threads to train the model: thread
  5. Tripadvisor recommendation use word2Vec model. For example, a user’s brwoser sequence is “ Madrid./…..” which means, this user actually search/browser Madird, then Boston..... so we can make up a sentence by the user’s browser sequence; The sentece we will use to feed the word2vec model is: Madrid, Lisbon,…” Like we do for The quick brown fox jumps over the lazy dog. after feeding many such sentences from different users, it learns pretty well how geos are similar in meaning! Then after I booked a vacational rentals in Boston, it will also recommend other places in Spain.
  6. It is hard for people Sarcasm is dependent on its context They think the the relationship between author and audience is central for understanding the sarcasm phenomenon. Promising work: looks at attributes of the author (author features), attributes of the intended recipient of a tweet (audience features), and the attributes of responses to potentially sarcastic tweets (response features). use of grammatical relations among words to model a sentence, and hence to determine words that are affected by negation.  static window and punctuation marks to determine the scope of negation. Using natural language processing to detect sarcasm on the internet still has a long way to go and may never be particularly reliable
  7. Feel free to ask me any questions.