SlideShare a Scribd company logo
1 of 15
Fake Review Detection on Yelp
Dataset
Presented by Group 12
What is the Question ?
Identify Fake reviews in the Yelp Data set for NYC.
Dataset Description
Source: Acquired originally by Rayana and Akoglu for their research and shared to us on
request.
Format: TSV
Description: Multiple files having information about Review text, Products, Users, and labels
of true or false reviews.
Size: 358,957 records
More About Data
No. of Products: 923
No. of Users: 160201
Time period: 1st Jan, 2007 - 9th Sep, 2014
No. of Labelled Fake Reviews: 36860 (approx. 10% of overall)
No. of Labelled True Reviews: 322097 (approx. 90% of overall)
Literature Review (Most Relevant)
1. Fake Review Detection on Yelp by Zehui Wang (wzehui), Yuzhu Zhang (arielzyz),
Tianpei Qian (tianpei).
a. Applied various models using linguistic and behavioral characteristics.
b. Good accuracy on Neural Networks.
2. Deceptive review detection using labeled and unlabeled data by Jitendra Kumar Rout,
Smriti Singh, Sanjay Kumar, Jena Sambit Bakshi.
a. Text Categorisation (N gram).
b. Sentiment Score
3. What Yelp Fake Review Filter Might Be Doing? by Arjun Mukherjee, Vivek
Venkataraman, Bing Liu, Natalie Glance
a. Comparison of Amazon Mechanical Turk (AMT) fake reviews with Yelp Data set.
b. Usage of Text as well as Behavior characteristics for identification.
Data Preprocessing
1. Merging datasets
2. Checking missing values
3. Checking duplicate rows
4. Text Processing
a. Removing Stopwords, Punctuation, Special characters
b. Lowercase the review text
c. Identifying common words in true and false reviews and removing them
d. Stemming - reducing inflection in words to their root forms
Exploratory Data Analysis
Comparing the number of reviews with rating for True and False.
Most fake reviews are having 4 or 5 rating, therefore fake reviews are generally positive.
Marketing strategy ?
Maybe ?
Word Cloud of Words in reviews
False Reviews
True Reviews
Behavioral Features Extracted
1. Behavioral analysis of the user’s review pattern
a. Average user rating
b. Total reviews given by user
1. How the restaurant performed in general
a. Average restaurant rating
Text Features Extracted
Extracted Text features from Review text.
Features Added -
1. Sentiment Score
2. Number of Nouns
3. Review Length
4. Number of Capital Words
5. Number of digits
6. TfIdf Vectorizer with N-gram (Trigram)
Dataset After Feature Extraction
Corpus - Review After text processing
Compound, neg, pos, neu - Sentiment score
What about unbalanced data ?
Techniques used :-
1. Random Oversampling
a. Increases minority classes through repetition of existing samples.
1. Synthetic Minority Over-sampling Technique
a. Creates new training sample from existing ones, adding variety.
Methods used for classification
1. Logistic regression
a. Estimates relationship between one dependent variable and one or more independent
variables
1. Naive Bayes Classifier
a. Probabilistic machine learning model that’s used for classification task
b. Based on the Bayes theorem.
1. K-Nearest Neighbors
a. Non parametric approach to classification
b. Chosen k = n1/2 where n is the number of samples (number of rows)
Results
References
1. Rayana, S. and Akoglu, L., 2015, August. Collective opinion spam detection: Bridging review
networks and metadata. In Proceedings of the 21th acm sigkdd international conference on
knowledge discovery and data mining (pp. 985-994). ACM.Citation Count: 120
2. Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N., 2013, June. What yelp fake review filter
might be doing?. In Seventh international AAAI conference on weblogs and social media. - Citation
Count: 242
3. Wang, Z., Zhang, Y. and Qian, T., Fake Review Detection on Yelp.
4. Rout, J.K., Singh, S., Jena, S.K. and Bakshi, S., 2017. Deceptive review detection using labeled and
unlabeled data. Multimedia Tools and Applications, 76(3), pp.3187-3211. Citation Count :15
5. Singh, M., Kumar, L. and Sinha, S., 2018. Model for detecting fake or spam reviews. In Ict based
innovations (pp. 213-217). Springer, Singapore. Citation Count :3

More Related Content

Similar to Yelp Fake Reviews Detection_new_v23.pptx

Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsData Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsAnkit Ghosalkar
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
Aspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete pptAspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete ppttanvikadam76
 
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...ijtsrd
 
Toxic Comment Classification
Toxic Comment ClassificationToxic Comment Classification
Toxic Comment Classificationijtsrd
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamIRJET Journal
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsEditor IJCATR
 
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Anne Arendt
 
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Anne Arendt
 
IRJET- Implementation of Review Selection using Deep Learning
IRJET-  	  Implementation of Review Selection using Deep LearningIRJET-  	  Implementation of Review Selection using Deep Learning
IRJET- Implementation of Review Selection using Deep LearningIRJET Journal
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward
 
Text Enhanced Recommendation System Model Based on Yelp Reviews
Text Enhanced Recommendation System Model Based on Yelp ReviewsText Enhanced Recommendation System Model Based on Yelp Reviews
Text Enhanced Recommendation System Model Based on Yelp ReviewsHari Sanadhya
 
Access Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge servicesAccess Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge servicesOpenAthens
 

Similar to Yelp Fake Reviews Detection_new_v23.pptx (20)

Sentiment analysis on unstructured review
Sentiment analysis on unstructured reviewSentiment analysis on unstructured review
Sentiment analysis on unstructured review
 
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsData Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Aspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete pptAspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete ppt
 
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
 
Abstract
AbstractAbstract
Abstract
 
Toxic Comment Classification
Toxic Comment ClassificationToxic Comment Classification
Toxic Comment Classification
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
nlp_finalpaper
nlp_finalpapernlp_finalpaper
nlp_finalpaper
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based Spam
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
 
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
Teaching with Technology Idea Exchange (TTIX) Presentation 06.10.10
 
IRJET- Implementation of Review Selection using Deep Learning
IRJET-  	  Implementation of Review Selection using Deep LearningIRJET-  	  Implementation of Review Selection using Deep Learning
IRJET- Implementation of Review Selection using Deep Learning
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021
 
Text Enhanced Recommendation System Model Based on Yelp Reviews
Text Enhanced Recommendation System Model Based on Yelp ReviewsText Enhanced Recommendation System Model Based on Yelp Reviews
Text Enhanced Recommendation System Model Based on Yelp Reviews
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Access Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge servicesAccess Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge services
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Yelp Fake Reviews Detection_new_v23.pptx

  • 1. Fake Review Detection on Yelp Dataset Presented by Group 12
  • 2. What is the Question ? Identify Fake reviews in the Yelp Data set for NYC.
  • 3. Dataset Description Source: Acquired originally by Rayana and Akoglu for their research and shared to us on request. Format: TSV Description: Multiple files having information about Review text, Products, Users, and labels of true or false reviews. Size: 358,957 records
  • 4. More About Data No. of Products: 923 No. of Users: 160201 Time period: 1st Jan, 2007 - 9th Sep, 2014 No. of Labelled Fake Reviews: 36860 (approx. 10% of overall) No. of Labelled True Reviews: 322097 (approx. 90% of overall)
  • 5. Literature Review (Most Relevant) 1. Fake Review Detection on Yelp by Zehui Wang (wzehui), Yuzhu Zhang (arielzyz), Tianpei Qian (tianpei). a. Applied various models using linguistic and behavioral characteristics. b. Good accuracy on Neural Networks. 2. Deceptive review detection using labeled and unlabeled data by Jitendra Kumar Rout, Smriti Singh, Sanjay Kumar, Jena Sambit Bakshi. a. Text Categorisation (N gram). b. Sentiment Score 3. What Yelp Fake Review Filter Might Be Doing? by Arjun Mukherjee, Vivek Venkataraman, Bing Liu, Natalie Glance a. Comparison of Amazon Mechanical Turk (AMT) fake reviews with Yelp Data set. b. Usage of Text as well as Behavior characteristics for identification.
  • 6. Data Preprocessing 1. Merging datasets 2. Checking missing values 3. Checking duplicate rows 4. Text Processing a. Removing Stopwords, Punctuation, Special characters b. Lowercase the review text c. Identifying common words in true and false reviews and removing them d. Stemming - reducing inflection in words to their root forms
  • 7. Exploratory Data Analysis Comparing the number of reviews with rating for True and False. Most fake reviews are having 4 or 5 rating, therefore fake reviews are generally positive. Marketing strategy ? Maybe ?
  • 8. Word Cloud of Words in reviews False Reviews True Reviews
  • 9. Behavioral Features Extracted 1. Behavioral analysis of the user’s review pattern a. Average user rating b. Total reviews given by user 1. How the restaurant performed in general a. Average restaurant rating
  • 10. Text Features Extracted Extracted Text features from Review text. Features Added - 1. Sentiment Score 2. Number of Nouns 3. Review Length 4. Number of Capital Words 5. Number of digits 6. TfIdf Vectorizer with N-gram (Trigram)
  • 11. Dataset After Feature Extraction Corpus - Review After text processing Compound, neg, pos, neu - Sentiment score
  • 12. What about unbalanced data ? Techniques used :- 1. Random Oversampling a. Increases minority classes through repetition of existing samples. 1. Synthetic Minority Over-sampling Technique a. Creates new training sample from existing ones, adding variety.
  • 13. Methods used for classification 1. Logistic regression a. Estimates relationship between one dependent variable and one or more independent variables 1. Naive Bayes Classifier a. Probabilistic machine learning model that’s used for classification task b. Based on the Bayes theorem. 1. K-Nearest Neighbors a. Non parametric approach to classification b. Chosen k = n1/2 where n is the number of samples (number of rows)
  • 15. References 1. Rayana, S. and Akoglu, L., 2015, August. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining (pp. 985-994). ACM.Citation Count: 120 2. Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N., 2013, June. What yelp fake review filter might be doing?. In Seventh international AAAI conference on weblogs and social media. - Citation Count: 242 3. Wang, Z., Zhang, Y. and Qian, T., Fake Review Detection on Yelp. 4. Rout, J.K., Singh, S., Jena, S.K. and Bakshi, S., 2017. Deceptive review detection using labeled and unlabeled data. Multimedia Tools and Applications, 76(3), pp.3187-3211. Citation Count :15 5. Singh, M., Kumar, L. and Sinha, S., 2018. Model for detecting fake or spam reviews. In Ict based innovations (pp. 213-217). Springer, Singapore. Citation Count :3