SlideShare a Scribd company logo
Jongwook Woo
BigDAI
CalStateLA
KMIS - Fall 2021
November 12, 2021
Savita Yadav, syadav5@calstatela.edu
Samyuktha Muralidharan, Jongwook Woo
Big Data AI Center (BigDAI)
California State University, Los Angeles
Predictive Analysis
for Airbnb Listing Rating
using Scalable Big Data Platform
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Why we need to predict Airbnb Listing Rating
Dataset Details
 Hardware Specifications
 Machine Learning algorithms used
Flowchart of the project
Comparison of results of Spark ML algorithms
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Need for Predicting Airbnb Listing Rating
The objective is to build a model that classifies a property as highly
rated or low rated based on the features of the listing.
Helps Airbnb hosts to make simple changes to the properties they
are listing in order to boost customer satisfaction and attract
potential bookings.
It can serve as a baseline to understand the factors that contribute
to the popularity and rating of a listing.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Dataset Details
 Dataset: Airbnb Listings
 Total Dataset size: 4 GB, 400 MB
 Dataset Format: CSV
 Dataset URLs:
https://public.opendatasoft.com/explore/dataset/airbnb-
listings/table/?disjunctive.host_verifications&disjunctive.amenities&disjun
ctive.features
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Technical Specifications
Spark: Databricks Subscription
Databricks Runtime version: DBR 8.2
ML (Apache Spark 3.1.1, GPU, Scala
2.12)
Instance: g4dn.xlarge
Memory: 64.0 GB
Nodes: 4
CPU Cores: 16
File System: DBFS (Data Bricks File
System)
Python Version 3.8.6
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Machine Learning Algorithms Used
Rating Prediction
Spark ML:
–Decision Tree Classifier
–Random Forest Classifier
–Gradient Boosted Tree Classifier
–Logistic Regression
–Support Vector Machine
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Airbnb Rating Prediction
Predicting whether the listing has a good rating or not.
Converting the Review Scores Rating column to categorical
Review Scores Rating >= 80 -> High Rating
Review Scores Rating < 80 -> Low Rating
Using two-class classification algorithms to build a model to
classify the listings as high rated and low rated, based on the
features of the listing.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Rating Prediction Measurement with Big
Data cluster on AWS
Accuracy
 We intend to reduce the number of False Positives to gain a higher
Precision value.
The AUC values are another significant factor to determine the accuracy.
Computing Time:
To train models.
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Decision Tree
o File Size: 400 MB
o Split Train/Test: 70:30
o Time Taken to run : 1.15
minutes
o AUC : 0.972
o Precision: 0.983
o Recall: 0.984
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Random Forest
o File Size: 400 MB
o Split Train/Test: 70:30
o Time Taken to run : 2.13
minutes
o AUC : 0.979
o Precision: 0.985
o Recall: 0.993
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Gradient Boosted Tree
o File Size: 400 MB
o Split Train/Test: 70:30
o Time Taken to run : 2.35
minutes
o AUC : 0.977
o Precision: 0.984
o Recall: 0.993
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Logistic Regression
o File Size: 400 MB
o Split Train/Test: 70:30
o Time Taken to run : 1.47
minutes
o AUC : 0.959
o Precision: 0.968
o Recall: 0.998
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Support Vector Machine
o File Size: 400 MB
o Split Train/Test: 70:30
o Time Taken to run : 4.49
minutes
o AUC : 0.958
o Precision: 0.966
o Recall: 0.998
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Comparison of results of Spark ML Algorithms
Models Computing
Time
Precision Recall AUC
Decision Tree 1.15 mins 0.983 0.984 0.972
Random Forest 2.13 mins 0.985 0.993 0.979
GBT Classifier 2.35 mins 0.984 0.993 0.977
Logistic
Regression
1.47 mins 0.968 0.998 0.959
SVM 4.49 mins 0.966 0.998 0.958
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Conclusion
 The metric computing time in the Decision Tree is the most
efficient, with 1.15 minutes.
 RF (and GBT) models have performed well to classify listings
in the United States.
RF model is the optimal
Accuracy is more important
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA

More Related Content

What's hot

Telstra Cloud
Telstra CloudTelstra Cloud
Telstra Cloud
Telstra_International
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
leopauly
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
Karam Munir Butt
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
Seta Wicaksana
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Hadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, sonHadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, son
Chengeng Ma
 
PR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image SynthesisPR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image Synthesis
Hyeongmin Lee
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
SAS Asia Pacific
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
Siddique Ibrahim
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5
taeseon ryu
 
Object detection
Object detectionObject detection
Object detection
Jksuryawanshi
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
VARUN KUMAR
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
SoftServe
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Wagston Staehler
 
Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine Learning
Neo4j
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understanding
Seiichi Uchida
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 

What's hot (20)

Telstra Cloud
Telstra CloudTelstra Cloud
Telstra Cloud
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Hadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, sonHadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, son
 
PR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image SynthesisPR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image Synthesis
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5
 
Object detection
Object detectionObject detection
Object detection
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine Learning
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understanding
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 

Similar to Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform

The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
Jongwook Woo
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019
Neha gupta
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
Jongwook Woo
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Jongwook Woo
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
Jongwook Woo
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
Jongwook Woo
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
Jongwook Woo
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Jongwook Woo
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
Jongwook Woo
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
Jongwook Woo
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
Abhishek M Shivalingaiah
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
Jongwook Woo
 
36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster
Ryan Riopelle, EIT
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
Márton Kodok
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableau
pkale1708
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
Jongwook Woo
 

Similar to Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform (20)

The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
Data analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableauData analysis using hive ql &amp; tableau
Data analysis using hive ql &amp; tableau
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
 

Recently uploaded

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform

  • 1. Jongwook Woo BigDAI CalStateLA KMIS - Fall 2021 November 12, 2021 Savita Yadav, syadav5@calstatela.edu Samyuktha Muralidharan, Jongwook Woo Big Data AI Center (BigDAI) California State University, Los Angeles Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
  • 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Why we need to predict Airbnb Listing Rating Dataset Details  Hardware Specifications  Machine Learning algorithms used Flowchart of the project Comparison of results of Spark ML algorithms
  • 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Need for Predicting Airbnb Listing Rating The objective is to build a model that classifies a property as highly rated or low rated based on the features of the listing. Helps Airbnb hosts to make simple changes to the properties they are listing in order to boost customer satisfaction and attract potential bookings. It can serve as a baseline to understand the factors that contribute to the popularity and rating of a listing.
  • 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Dataset Details  Dataset: Airbnb Listings  Total Dataset size: 4 GB, 400 MB  Dataset Format: CSV  Dataset URLs: https://public.opendatasoft.com/explore/dataset/airbnb- listings/table/?disjunctive.host_verifications&disjunctive.amenities&disjun ctive.features
  • 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Technical Specifications Spark: Databricks Subscription Databricks Runtime version: DBR 8.2 ML (Apache Spark 3.1.1, GPU, Scala 2.12) Instance: g4dn.xlarge Memory: 64.0 GB Nodes: 4 CPU Cores: 16 File System: DBFS (Data Bricks File System) Python Version 3.8.6
  • 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Machine Learning Algorithms Used Rating Prediction Spark ML: –Decision Tree Classifier –Random Forest Classifier –Gradient Boosted Tree Classifier –Logistic Regression –Support Vector Machine
  • 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA
  • 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Airbnb Rating Prediction Predicting whether the listing has a good rating or not. Converting the Review Scores Rating column to categorical Review Scores Rating >= 80 -> High Rating Review Scores Rating < 80 -> Low Rating Using two-class classification algorithms to build a model to classify the listings as high rated and low rated, based on the features of the listing.
  • 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Rating Prediction Measurement with Big Data cluster on AWS Accuracy  We intend to reduce the number of False Positives to gain a higher Precision value. The AUC values are another significant factor to determine the accuracy. Computing Time: To train models.
  • 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Decision Tree o File Size: 400 MB o Split Train/Test: 70:30 o Time Taken to run : 1.15 minutes o AUC : 0.972 o Precision: 0.983 o Recall: 0.984
  • 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Random Forest o File Size: 400 MB o Split Train/Test: 70:30 o Time Taken to run : 2.13 minutes o AUC : 0.979 o Precision: 0.985 o Recall: 0.993
  • 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Gradient Boosted Tree o File Size: 400 MB o Split Train/Test: 70:30 o Time Taken to run : 2.35 minutes o AUC : 0.977 o Precision: 0.984 o Recall: 0.993
  • 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Logistic Regression o File Size: 400 MB o Split Train/Test: 70:30 o Time Taken to run : 1.47 minutes o AUC : 0.959 o Precision: 0.968 o Recall: 0.998
  • 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Support Vector Machine o File Size: 400 MB o Split Train/Test: 70:30 o Time Taken to run : 4.49 minutes o AUC : 0.958 o Precision: 0.966 o Recall: 0.998
  • 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Comparison of results of Spark ML Algorithms Models Computing Time Precision Recall AUC Decision Tree 1.15 mins 0.983 0.984 0.972 Random Forest 2.13 mins 0.985 0.993 0.979 GBT Classifier 2.35 mins 0.984 0.993 0.977 Logistic Regression 1.47 mins 0.968 0.998 0.959 SVM 4.49 mins 0.966 0.998 0.958
  • 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Conclusion  The metric computing time in the Decision Tree is the most efficient, with 1.15 minutes.  RF (and GBT) models have performed well to classify listings in the United States. RF model is the optimal Accuracy is more important
  • 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA