SlideShare a Scribd company logo
YELP RECOMMENDATION SYSTEM AND SENTIMENT
ANALYSIS
DEVESH KANDPAL
SOHAM DASNEOGI
YELP DATASET
 The Yelp dataset is a subset of businesses, reviews and user data
 Each file consists of a single object type, one JSON-object-per-line
 Yelp aggregates review data from its users and ranks restaurants based on them
GOALS
 Using user profile, we would recommend restaurants using geospatial location in conjunction with
collaborative filtering
 Using the user reviews, we would performed sentimental analysis on the recommended restaurant
reviews.
SYSTEM ARCHITECTURE
TECHNOLOGY STACK
 Apache Spark (PySpark) for data cleaning and segregation
 Flask for buiding the web application
 MongoDB for user and restaurant details apart from storing the results of Option Mining and Sentiment
Analysis done using LDA
 Spark ML for ALS Matrix Factorization Model
 Genism for Topic Modeling on restaurant reviews and sentiment polarity
 Google Cloud Compute for hosting the data pipeline
RECOMMENDER SYSTEM
 The two most common types of recommender systems are Content-Based and Collaborative Filtering
 Collaborative Filtering produces recommendations based on the knowledge of users’ attitude to items,
that is it uses the wisdom of the crowd to recommend items
 Content based recommender system focus on the attributes of the items and give you
recommendations based on the similarity between them
COLLABORATIVE FILTERING
 Collaborative Filtering is most commonly used because it usually gives better results.
 The algorithm has the ability to do feature learning i.e it can start to learn for itself what features to use
for itself
 Collaborative Filtering is further divided into MEMORY-BASED CF and MODEL-BASED CF
MODEL BASED COLLABORATIVE FILTERING
 Model-Based CF is based on matrix factorization which is an unsupervised learning method for latent
variable decomposition and dimensionality reduction
 We are using ALTERNATING LEAST SQUARES (ALS) Matrix Factorization method
 ALS works by trying to find out the optimal representation of a user and a product matrix
 The genius part of ALS is that it alternates between finding the optimal values for the user matrix and the
product matrix
ALTERNATING LEAST SQUARES
 The ALS modelling is an unsupervised technique and takes into consideration latent factors.
 Our model is build on three features user business and reviews
 We construct a Matrix Factorization model using these three features
 Resulting model is capable of answering the following:
 1. Products for Users
 II. Users of a given Product
 We evaluate the model on basis of RMSE score
STEPS FOR ALS MODELLING
 For each state, we load the json file containing a combined view of user reviews and restaurants
 We select user_id, business_id, and rating and do a 80:20 random split on the data
 The training data is used to train the ALS model
 The tesing dats is used to compute the RMSE
 We get and RMSE score of 1.3
 The models are pickled and saved so that it can be consumed by the flask application.
SENTIMENT ANALYSIS
 Sentiment Analysis is opinion mining and a process to determining whether a piece of review is
POSITIVE NEGATIVE or NEUTRAL.
 It is deriving the opinion of the writer.
 We have used Latent Dirichlet Analysis (LDA) to mine the user reviews. The idea is to showcase the what
people in the consolidated are talking about so that it gives a high level view to the business users
without getting into the details of reading each review
STEPS FOR SENTIMENT ANALYSIS
 Combine all reviews of a single restaurant into a single document
 The entire document is tokenized and cleaned using stop words provided in the gensim library.
 We then used textblob on this cleaned text to determine sentiment polarity
 We then used gensim library for first creating a dictionary followed by corpus by using the cleaned text
 Once the corpus and dictionary are created, we used an unsupervised learning technique called LDA for topic
modelling
 We repeat this process for all the restaurants across all the states
 We then use LDA model to identify the top topics in the corpus and store the top 15 tokens in each topic in a
dictionary which is then dumped into mongo db along with the calculated polarity of the reviews and other
details
GOOGLE CLOUD PLATFORM
ROLE
 Devesh
 EDA on business.json
 Prepocessing by splitting and combining using spark job
 Model building - LDA Opinion Mining for each restaurant and dumping to MongoDB
 Getting data from mongodb against restaurants to view the results of opinion mining
 Cloud Deployment including setting up of Java, Scala, Spark, MongoDB and Flask Application
 Soham
 EDA on business.json, user.json, review.json
 Prepocessing data
 Model building – SVD and ALS model and pickling models
 Building UI components with dynamic page navigation
 Backend flask logic for functionalities

More Related Content

Similar to Yelp recommendation system and sentiment analysis

Olap introduction
Olap introductionOlap introduction
Olap introduction
Ashish Awasthi
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana Models
Ashley Thomas
 
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
AIRCC Publishing Corporation
 
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
ijcsit
 
Poster (2)
Poster (2)Poster (2)
How recommender systems work
How recommender systems work How recommender systems work
How recommender systems work
SK Reddy
 
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET Journal
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
SBGC
 
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
IJCSIS Research Publications
 
Connecting social media to e commerce (2)
Connecting social media to e commerce (2)Connecting social media to e commerce (2)
Connecting social media to e commerce (2)
krsenthamizhselvi
 
535701365-Project-on-Movie-Recommendation.pptx
535701365-Project-on-Movie-Recommendation.pptx535701365-Project-on-Movie-Recommendation.pptx
535701365-Project-on-Movie-Recommendation.pptx
MOHAMMED495457
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
journalBEEI
 
INFORMATION MODELS.pptx
INFORMATION MODELS.pptxINFORMATION MODELS.pptx
INFORMATION MODELS.pptx
RUPAK BHATTACHARJEE
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
hkbhadraa
 
SA_MSA_ICBAI_2016_presentation_v1.0
SA_MSA_ICBAI_2016_presentation_v1.0SA_MSA_ICBAI_2016_presentation_v1.0
SA_MSA_ICBAI_2016_presentation_v1.0
Vineetha Vishnu
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
ShilpaKrishna6
 
Automatic customer review summarization using deep learningbased hybrid senti...
Automatic customer review summarization using deep learningbased hybrid senti...Automatic customer review summarization using deep learningbased hybrid senti...
Automatic customer review summarization using deep learningbased hybrid senti...
IJECEIAES
 

Similar to Yelp recommendation system and sentiment analysis (20)

Olap introduction
Olap introductionOlap introduction
Olap introduction
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana Models
 
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
 
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
TEXT MINING CUSTOMER REVIEWS FOR ASPECTBASED RESTAURANT RATING
 
Poster (2)
Poster (2)Poster (2)
Poster (2)
 
How recommender systems work
How recommender systems work How recommender systems work
How recommender systems work
 
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
Conversion Prediction for Advertisement Recommendation using Expectation Maxi...
 
Connecting social media to e commerce (2)
Connecting social media to e commerce (2)Connecting social media to e commerce (2)
Connecting social media to e commerce (2)
 
535701365-Project-on-Movie-Recommendation.pptx
535701365-Project-on-Movie-Recommendation.pptx535701365-Project-on-Movie-Recommendation.pptx
535701365-Project-on-Movie-Recommendation.pptx
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
 
INFORMATION MODELS.pptx
INFORMATION MODELS.pptxINFORMATION MODELS.pptx
INFORMATION MODELS.pptx
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
 
SA_MSA_ICBAI_2016_presentation_v1.0
SA_MSA_ICBAI_2016_presentation_v1.0SA_MSA_ICBAI_2016_presentation_v1.0
SA_MSA_ICBAI_2016_presentation_v1.0
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Automatic customer review summarization using deep learningbased hybrid senti...
Automatic customer review summarization using deep learningbased hybrid senti...Automatic customer review summarization using deep learningbased hybrid senti...
Automatic customer review summarization using deep learningbased hybrid senti...
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 

Yelp recommendation system and sentiment analysis

  • 1. YELP RECOMMENDATION SYSTEM AND SENTIMENT ANALYSIS DEVESH KANDPAL SOHAM DASNEOGI
  • 2. YELP DATASET  The Yelp dataset is a subset of businesses, reviews and user data  Each file consists of a single object type, one JSON-object-per-line  Yelp aggregates review data from its users and ranks restaurants based on them
  • 3. GOALS  Using user profile, we would recommend restaurants using geospatial location in conjunction with collaborative filtering  Using the user reviews, we would performed sentimental analysis on the recommended restaurant reviews.
  • 5. TECHNOLOGY STACK  Apache Spark (PySpark) for data cleaning and segregation  Flask for buiding the web application  MongoDB for user and restaurant details apart from storing the results of Option Mining and Sentiment Analysis done using LDA  Spark ML for ALS Matrix Factorization Model  Genism for Topic Modeling on restaurant reviews and sentiment polarity  Google Cloud Compute for hosting the data pipeline
  • 6. RECOMMENDER SYSTEM  The two most common types of recommender systems are Content-Based and Collaborative Filtering  Collaborative Filtering produces recommendations based on the knowledge of users’ attitude to items, that is it uses the wisdom of the crowd to recommend items  Content based recommender system focus on the attributes of the items and give you recommendations based on the similarity between them
  • 7. COLLABORATIVE FILTERING  Collaborative Filtering is most commonly used because it usually gives better results.  The algorithm has the ability to do feature learning i.e it can start to learn for itself what features to use for itself  Collaborative Filtering is further divided into MEMORY-BASED CF and MODEL-BASED CF
  • 8. MODEL BASED COLLABORATIVE FILTERING  Model-Based CF is based on matrix factorization which is an unsupervised learning method for latent variable decomposition and dimensionality reduction  We are using ALTERNATING LEAST SQUARES (ALS) Matrix Factorization method  ALS works by trying to find out the optimal representation of a user and a product matrix  The genius part of ALS is that it alternates between finding the optimal values for the user matrix and the product matrix
  • 9. ALTERNATING LEAST SQUARES  The ALS modelling is an unsupervised technique and takes into consideration latent factors.  Our model is build on three features user business and reviews  We construct a Matrix Factorization model using these three features  Resulting model is capable of answering the following:  1. Products for Users  II. Users of a given Product  We evaluate the model on basis of RMSE score
  • 10. STEPS FOR ALS MODELLING  For each state, we load the json file containing a combined view of user reviews and restaurants  We select user_id, business_id, and rating and do a 80:20 random split on the data  The training data is used to train the ALS model  The tesing dats is used to compute the RMSE  We get and RMSE score of 1.3  The models are pickled and saved so that it can be consumed by the flask application.
  • 11. SENTIMENT ANALYSIS  Sentiment Analysis is opinion mining and a process to determining whether a piece of review is POSITIVE NEGATIVE or NEUTRAL.  It is deriving the opinion of the writer.  We have used Latent Dirichlet Analysis (LDA) to mine the user reviews. The idea is to showcase the what people in the consolidated are talking about so that it gives a high level view to the business users without getting into the details of reading each review
  • 12. STEPS FOR SENTIMENT ANALYSIS  Combine all reviews of a single restaurant into a single document  The entire document is tokenized and cleaned using stop words provided in the gensim library.  We then used textblob on this cleaned text to determine sentiment polarity  We then used gensim library for first creating a dictionary followed by corpus by using the cleaned text  Once the corpus and dictionary are created, we used an unsupervised learning technique called LDA for topic modelling  We repeat this process for all the restaurants across all the states  We then use LDA model to identify the top topics in the corpus and store the top 15 tokens in each topic in a dictionary which is then dumped into mongo db along with the calculated polarity of the reviews and other details
  • 14. ROLE  Devesh  EDA on business.json  Prepocessing by splitting and combining using spark job  Model building - LDA Opinion Mining for each restaurant and dumping to MongoDB  Getting data from mongodb against restaurants to view the results of opinion mining  Cloud Deployment including setting up of Java, Scala, Spark, MongoDB and Flask Application  Soham  EDA on business.json, user.json, review.json  Prepocessing data  Model building – SVD and ALS model and pickling models  Building UI components with dynamic page navigation  Backend flask logic for functionalities