SlideShare a Scribd company logo
Spot Deceptive TripAdvisor Hotel
Reviews
By: Yousef Fadila
Project Notebook:
https://github.com/yousef-fadila/cs548-project-5/blob/master/notebook.ipynb
CS548: Text Mining Project
Motivation - Fake reviews in the news
TripAdvisor warns of hotels posting fake reviews
http://abcnews.go.com/Technology/story?id=8094231
Twitter campaign takes aim at fake restaurant reviews on
TripAdvisor
https://www.theguardian.com/travel/2015/oct/24/twitter-campaign-targets-fake-tripadvisor-restaurant-reviews
Datasets
Deceptive Opinion Spam Corpus TripAdvisor Hotel-reviews
Consists of:
400 deceptive positive reviews
400 deceptive negative reviews
⇒ From Amazon Turks
400 truthful positive reviews
400 truthful negative reviews
⇒ From Trusted users in TripAdvisor
Consists of:
878561 reviews from 4333 hotels
crawled from TripAdvisor.
⇒ Includes meta-data. (hotel name,
rating, stars, location..)
Outline
Guiding Questions:
1. Which is more prevalent, positive deceptive or negative deceptive reviews among the
200,000 sample reviews?
2. What star-rating of hotels most commonly has deceptive reviews? Who are the top ten
hotels with deceptive positive reviews?
3. Is there enough support to claim that deceptive positive reviews are used to cover
previous negative reviews?
Extra:
1. Would a 2-step approach based on domain knowledge (like the one presented on
anomaly detection showcase) improve the accuracy of the text classification model?
2. Demo: Try it yourself.
3. Are computers better than Humans in detecting deceptive reviews?
Text Classification Model
1. (1,3) n_grams
2. min_df=3
3. max_df=0.96
4. LinearSVC classification.
Positive deceptive vs. negative deceptive ratio
1. Which is more prevalent, positive deceptive or negative deceptive reviews among
the 200,000 sample reviews?
Answer:
Positive deceptive reviews are more
prevalent.
Hotel Stars-Rating vs. Deceptive reviews rate
1. What star rating of hotels most commonly has deceptive reviews? who are the top
hotels according deceptive positive ratio reviews?
Top “deceptive” Hotels:
********Inn Houston
******** York Hotel
********ose Hotel
********a Inn Houston Wirt Road
********lmonico
Frequent Sequences Leads to Positive Deceptive
Reviews1. Pick up 20 hotels with deceptive reviews
2. Export all reviews of the selected hotels to arff file
3. Set sequence Id to hotel Id.
4. Run GSP algorithm in Weka.
2 Step Approach
1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly
detection showcase) improve the accuracy of the text classification model?
What features could be used
to distinguish deceptive from
truthful?
False Positive vs False Negative.
Supervised vs Unsupervised
Content Based Features
Some online reviews are too good to be true; Cornell computers spot 'opinion spam' http://bit.ly/2g6ou9X
"The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for
example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price."
Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth-
tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and
sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous
analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns."
Features to extract from the review text:
1)amount of punctuation
2)total nouns - total verbs
3)length of the review.
4)adjective and adverbs ratio
Unsupervised AD Followed by supervised classifier
No Improvement!
2nd Try: One Single Step Supervised Model
Merge both “bag of words” features and the content based extracted features
together for supervised classifier.
No Improvement!
3rd Try: Change Topology
2 supervised text
classification models.
Positive-negative
based only on “bag of words”.
Deceptive-truthful uses
both bag of words and
content based features.
3rd Try: Change Topology - Result
Overall
Improvement by 7%!
Demo: Try it yourself
www.yousef.fadila.net/cs548
REST API:
POST REQUEST to:
www.yousef.fadila.net/cs548/review_checker
Payload: {'review_text': text}
Sample response:{"result": "Likely Fake" }
Computers vs. Humans
Are computers better than Humans in detecting deceptive reviews?
Survey of WPI students
74 WPI students responded
Students were given 5 positive reviews and were asked to decide whether
they are truthful or deceptive reviews
The list intentionally includes reviews that weren’t classified correctly using
the model from 1st experiment
Computers vs. Humans
1 Computers Humans
1 1
Computers vs. Humans
1 Computers Humans
1 1
1 0
Computers vs. Humans
1 Computers Humans
1 1
1 0
0 0
Computers vs. Humans
1 Computers Humans
1 1
1 0
0 0
1 1
Computers vs. Humans
Computers Humans
1 1
1 0
0 0
1 1
1 1
Computers vs. Humans - Result
This is not a scientific study nor a
statistical one!
This is only a game! In fact it is unfair game as we use
reviews from the dataset we train the model on them!
The purpose of the game is to show if humans truth bias,
assuming that what they are reading is true until they find
evidence to the contrary, could affect their ability to spot
deceptive reviews.
Computers Humans
1 1
1 0
0 0
1 1
1 1
4 3
Any Questions?

More Related Content

Viewers also liked

Innovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعيInnovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعي
Yousef Fadila
 
Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016
Europe Language Jobs
 
Incapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legalsIncapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legals
Soraya López
 
Tercer indicador. michel y lina
Tercer indicador. michel y linaTercer indicador. michel y lina
Tercer indicador. michel y linaensvfasrensv
 
Historia de roma
Historia de romaHistoria de roma
Historia de roma
UNIVERSIDAD DE ANTIOQUIA
 
Actividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccionActividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccion
Julio Barreto Garcia
 
Por la orda
Por la ordaPor la orda
Por la ordaCubazu01
 
Oa slide
Oa slideOa slide
Oa slidesofiagh
 
Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)sandra_carvajal
 
Matrixprop
MatrixpropMatrixprop
Matrixprop
Frank Lucas
 
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
International Journal of Technical Research & Application
 
דולפינריום מצגת חדשה
דולפינריום מצגת  חדשהדולפינריום מצגת  חדשה
דולפינריום מצגת חדשה
Yakir Ben-Maor
 

Viewers also liked (17)

Innovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعيInnovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعي
 
Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016
 
Trabajo
TrabajoTrabajo
Trabajo
 
Topología
TopologíaTopología
Topología
 
Mery sanchez....
Mery sanchez....Mery sanchez....
Mery sanchez....
 
Incapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legalsIncapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legals
 
Tercer indicador. michel y lina
Tercer indicador. michel y linaTercer indicador. michel y lina
Tercer indicador. michel y lina
 
Historia de roma
Historia de romaHistoria de roma
Historia de roma
 
R25798
R25798R25798
R25798
 
Reconocimiento general y de actores
Reconocimiento general y de actoresReconocimiento general y de actores
Reconocimiento general y de actores
 
Actividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccionActividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccion
 
Por la orda
Por la ordaPor la orda
Por la orda
 
Oa slide
Oa slideOa slide
Oa slide
 
Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)
 
Matrixprop
MatrixpropMatrixprop
Matrixprop
 
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
 
דולפינריום מצגת חדשה
דולפינריום מצגת  חדשהדולפינריום מצגת  חדשה
דולפינריום מצגת חדשה
 

Similar to Spot deceptive TripAdvisor Reviews

Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniques
ijceronline
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
MahendraDwivedi7
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Main
athiathi3
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET Journal
 
Yelp Product Challenge
Yelp Product ChallengeYelp Product Challenge
Yelp Product Challenge
Hisham Radwan
 
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
IRJET Journal
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET Journal
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
ridhimamittal3011
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET Journal
 
Marriott User Research Findings
Marriott User Research FindingsMarriott User Research Findings
Marriott User Research Findings
Jonathan Coen
 
Collective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and MetadataCollective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and Metadata
Shebuti Rayana
 
A Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud DetectionA Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud Detection
IJMER
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion Mining
IRJET Journal
 
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of DeceptionEACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
Stephanie Steinhardt
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
gerogepatton
 
The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...
INFOGAIN PUBLICATION
 
Curbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp BehaviorsCurbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp Behaviors
Mahmudur Rahman
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testingPhilip Johnson
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET Journal
 

Similar to Spot deceptive TripAdvisor Reviews (20)

Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniques
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Main
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
 
Yelp Product Challenge
Yelp Product ChallengeYelp Product Challenge
Yelp Product Challenge
 
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
 
Marriott User Research Findings
Marriott User Research FindingsMarriott User Research Findings
Marriott User Research Findings
 
nlp_finalpaper
nlp_finalpapernlp_finalpaper
nlp_finalpaper
 
Collective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and MetadataCollective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and Metadata
 
A Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud DetectionA Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud Detection
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion Mining
 
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of DeceptionEACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...
 
Curbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp BehaviorsCurbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp Behaviors
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testing
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
 

More from Yousef Fadila

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
Yousef Fadila
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaper
Yousef Fadila
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
Yousef Fadila
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
Yousef Fadila
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platform
Yousef Fadila
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
Yousef Fadila
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
Yousef Fadila
 
Anomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you canAnomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you can
Yousef Fadila
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
Yousef Fadila
 

More from Yousef Fadila (9)

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaper
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platform
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Anomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you canAnomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you can
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
 

Recently uploaded

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

Spot deceptive TripAdvisor Reviews

  • 1. Spot Deceptive TripAdvisor Hotel Reviews By: Yousef Fadila Project Notebook: https://github.com/yousef-fadila/cs548-project-5/blob/master/notebook.ipynb CS548: Text Mining Project
  • 2. Motivation - Fake reviews in the news TripAdvisor warns of hotels posting fake reviews http://abcnews.go.com/Technology/story?id=8094231 Twitter campaign takes aim at fake restaurant reviews on TripAdvisor https://www.theguardian.com/travel/2015/oct/24/twitter-campaign-targets-fake-tripadvisor-restaurant-reviews
  • 3. Datasets Deceptive Opinion Spam Corpus TripAdvisor Hotel-reviews Consists of: 400 deceptive positive reviews 400 deceptive negative reviews ⇒ From Amazon Turks 400 truthful positive reviews 400 truthful negative reviews ⇒ From Trusted users in TripAdvisor Consists of: 878561 reviews from 4333 hotels crawled from TripAdvisor. ⇒ Includes meta-data. (hotel name, rating, stars, location..)
  • 4. Outline Guiding Questions: 1. Which is more prevalent, positive deceptive or negative deceptive reviews among the 200,000 sample reviews? 2. What star-rating of hotels most commonly has deceptive reviews? Who are the top ten hotels with deceptive positive reviews? 3. Is there enough support to claim that deceptive positive reviews are used to cover previous negative reviews? Extra: 1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly detection showcase) improve the accuracy of the text classification model? 2. Demo: Try it yourself. 3. Are computers better than Humans in detecting deceptive reviews?
  • 5. Text Classification Model 1. (1,3) n_grams 2. min_df=3 3. max_df=0.96 4. LinearSVC classification.
  • 6. Positive deceptive vs. negative deceptive ratio 1. Which is more prevalent, positive deceptive or negative deceptive reviews among the 200,000 sample reviews? Answer: Positive deceptive reviews are more prevalent.
  • 7. Hotel Stars-Rating vs. Deceptive reviews rate 1. What star rating of hotels most commonly has deceptive reviews? who are the top hotels according deceptive positive ratio reviews? Top “deceptive” Hotels: ********Inn Houston ******** York Hotel ********ose Hotel ********a Inn Houston Wirt Road ********lmonico
  • 8. Frequent Sequences Leads to Positive Deceptive Reviews1. Pick up 20 hotels with deceptive reviews 2. Export all reviews of the selected hotels to arff file 3. Set sequence Id to hotel Id. 4. Run GSP algorithm in Weka.
  • 9. 2 Step Approach 1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly detection showcase) improve the accuracy of the text classification model? What features could be used to distinguish deceptive from truthful? False Positive vs False Negative. Supervised vs Unsupervised
  • 10. Content Based Features Some online reviews are too good to be true; Cornell computers spot 'opinion spam' http://bit.ly/2g6ou9X "The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price." Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth- tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns." Features to extract from the review text: 1)amount of punctuation 2)total nouns - total verbs 3)length of the review. 4)adjective and adverbs ratio
  • 11. Unsupervised AD Followed by supervised classifier No Improvement!
  • 12. 2nd Try: One Single Step Supervised Model Merge both “bag of words” features and the content based extracted features together for supervised classifier. No Improvement!
  • 13. 3rd Try: Change Topology 2 supervised text classification models. Positive-negative based only on “bag of words”. Deceptive-truthful uses both bag of words and content based features.
  • 14. 3rd Try: Change Topology - Result Overall Improvement by 7%!
  • 15. Demo: Try it yourself www.yousef.fadila.net/cs548 REST API: POST REQUEST to: www.yousef.fadila.net/cs548/review_checker Payload: {'review_text': text} Sample response:{"result": "Likely Fake" }
  • 16. Computers vs. Humans Are computers better than Humans in detecting deceptive reviews? Survey of WPI students 74 WPI students responded Students were given 5 positive reviews and were asked to decide whether they are truthful or deceptive reviews The list intentionally includes reviews that weren’t classified correctly using the model from 1st experiment
  • 17. Computers vs. Humans 1 Computers Humans 1 1
  • 18. Computers vs. Humans 1 Computers Humans 1 1 1 0
  • 19. Computers vs. Humans 1 Computers Humans 1 1 1 0 0 0
  • 20. Computers vs. Humans 1 Computers Humans 1 1 1 0 0 0 1 1
  • 21. Computers vs. Humans Computers Humans 1 1 1 0 0 0 1 1 1 1
  • 22. Computers vs. Humans - Result This is not a scientific study nor a statistical one! This is only a game! In fact it is unfair game as we use reviews from the dataset we train the model on them! The purpose of the game is to show if humans truth bias, assuming that what they are reading is true until they find evidence to the contrary, could affect their ability to spot deceptive reviews. Computers Humans 1 1 1 0 0 0 1 1 1 1 4 3