Dadm (lys)

•Download as PPTX, PDF•

0 likes•114 views

SameeriMamillapalli

Data Analysis

Technology

Contents
• Business Objective – Problem Statement
• Solution Methodology
• Data Preparation and Consolidation
• Key Challenges
• Result
• Conclusion

The Google Play Store team is about to launch a new feature wherein,
certain apps that are promising, are boosted in visibility. The boost will
manifest in multiple ways including higher priority in recommendations
sections (“Similar apps”, “You might also like”, “New and updated
games”). These will also get a boost in search results visibility. This
feature will help bring more attention to newer apps that have the potential.
Make a model to predict the app rating, with other information about the app
provided.
Problem
Statements
Objectives
Business Objective – Problem Statement

Solution Methodology
Outlier Treatment with IQR
Data Cleaning
Data Visualization
Loss Calculation
Removing Skewness in the data
Model Building
Solution
Methodology
Technology
Enablers
Realized
Objectives
NumPy
Pandas
Plotly
Seaborn
Cufflinks
Sklearn
Prediction Model With minimum error

Data Preparation and Consolidation
• Data frame has 10841 rows and 13 distinct columns. The dependent variable is "Rating." We detect an
inappropriate rating in our data while visualizing the numbers of "Rating." Apps are generally rated between
0 and 5 stars.
• Removing unwanted variables with no proper correlation with the data Last updated, Current Ver,
Android Ver
• Transforming the data into a numerical form to view it. The conversion is completed by replacing all of the
strings and translating them into the numerical format in various methods.
• The process of cleaning the data has been done by imputing all the null values with median. As well as
dropping the rows which are greater than then interquartile range so that to avoid the outlier which has been
occurred previously

Data Preparation and Consolidation (Results)
After Removal of outliers
through Inter Quartile Range

Key Challenges
• Imputing null values in the dependent variable with median so as to reduce the bias in the final prediction
model (Imputing with mean increased deviation in data)
• Removing unwanted characters and converting to respective datatype
• Plotting different data using various plots like box, bar, Cumulative distribution plots for each and
collection of attributes to identify outliers and distribution of data in a given range.
• Outlier Treatment using Inter-Quartile-Range and reducing skewness using log transformation removing
non-linearity
• Availability of less data for training the model thereby identifying minimal patterns for prediction
• Comaparing the results of all the algorithms and to determine the best model which is quite ambigious on
the data with good accuracy and an increased error and vice versa

Key Challenges
(Results)
Comparison between Actual
and Predicted Values
Cumulative Distribution Plots
after log-transformation

Conclusion
Project Plan and Support Required
Results
• OLS Regression gives better R-2 Score with 0.98 over
testing data but may not be used for prediction as
statistical techniques are ambiguous on new data with a
minimal error rate of 12%.
• Secondly, polynomial regression gives an R2- Score of
0.24 but RMSE is large sue to non-linearity of data.
• Bagging technique Decision Tree is unfit for prediction of
rating with less accuracy and more error.
• Further Hyperparameter tuning might improve SVR
accuracy as it R2-Score is 0.08 and Error Rate is 0.9
which require better computation power
• Overall OLS Regression and Support Vector regressor
proved to be better counter parts to predict the dependent
variable but training huge data is essential to learn
patterns as the error rate would decrease

What's hot

Nâng cao hiệu quả kinh doanh của Công ty Cổ phần Bánh kẹo Hải Hà.pdfTÀI LIỆU NGÀNH MAY

Đề tài: Hoàn thiện công tác lập và phân tích Bảng cân đối kế toán tại Công ty...Viết thuê trọn gói ZALO 0934573149

PHÂN TÍCH HIỆU QUẢ HOẠT ĐỘNG KINH DOANH TẠI CÔNG TY CỔ PHẦN TƯ VẤN ĐẦU TƯ...Nguyễn Công Huy

Đề tài: Nâng cao hiệu quả kinh doanh tại công ty vận tải biển, 9đDịch vụ viết bài trọn gói ZALO 0917193864

Giải pháp tài chính nhằm nâng cao hiệu quả sản xuất kinh doanh của công ty cổ...NOT

Đề tài phân tích tài chính doanh nghiệp tại công ty cổ phần Habada RẤT HAYDịch Vụ Viết Bài Trọn Gói ZALO 0917193864

Đề tài: Phân tích hoạt động marketing –mix tại Công Ty Cổ Phần Cao Su Phước HòaViết thuê trọn gói ZALO 0934573149

Đề tài: Một số giải pháp nâng cao chất lượng dịch vụ ăn uống tại nhà hàng – k...Dịch Vụ Viết Thuê Khóa Luận Zalo/Telegram 0917193864

Nâng cao hiệu quả sản xuất kinh doanh tại công ty tnhh quốc tế khánh sinhhttps://www.facebook.com/garmentspace

phát triển chiến lược kinh doanh nghỉ dưỡngThao Vy

Hoàn thiện chính sách cổ tức tại công ty cổ phần fptNOT

Nâng cao chất lượng hoạt động môi giới tại công ty chứng khoánDịch Vụ Viết Bài Trọn Gói ZALO 0917193864

Phân tích và hoàn thiện cấu trúc vốn tại công ty cổ phần hoàng anh gia laihttps://www.facebook.com/garmentspace

Phân tích tình hình tài chính của công ty cổ phần tập đoàn hòa pháthttps://www.facebook.com/garmentspace

BÀI MẪU Khóa luận phân tích báo cáo tài chính, HOTViết Thuê Khóa Luận _ ZALO 0917.193.864 default

Luận văn: Nâng cao sự hài lòng của khách hàng tại ngân hàng, HAYViết thuê trọn gói ZALO 0934573149

Phân tích tình hình tài chính tại công ty cổ phần bánh kẹo hải hàhttps://www.facebook.com/garmentspace

Đề tài: Đánh giá hiệu quả hoạt động Digital Marketing của công ty TNHH công n...Dịch vụ viết thuê Khóa Luận - ZALO 0932091562

Luận văn: Phân tích tình hình sản xuất của công ty thủy sản, HAYDịch vụ viết bài trọn gói ZALO 0917193864

Giải pháp nâng cao hiệu quả hoạt động marketing mix tại khách sạn sheraton ...https://www.facebook.com/garmentspace

What's hot (20)

Nâng cao hiệu quả kinh doanh của Công ty Cổ phần Bánh kẹo Hải Hà.pdf

Đề tài: Hoàn thiện công tác lập và phân tích Bảng cân đối kế toán tại Công ty...

PHÂN TÍCH HIỆU QUẢ HOẠT ĐỘNG KINH DOANH TẠI CÔNG TY CỔ PHẦN TƯ VẤN ĐẦU TƯ...

Đề tài: Nâng cao hiệu quả kinh doanh tại công ty vận tải biển, 9đ

Giải pháp tài chính nhằm nâng cao hiệu quả sản xuất kinh doanh của công ty cổ...

Đề tài phân tích tài chính doanh nghiệp tại công ty cổ phần Habada RẤT HAY

Đề tài: Phân tích hoạt động marketing –mix tại Công Ty Cổ Phần Cao Su Phước Hòa

Đề tài: Một số giải pháp nâng cao chất lượng dịch vụ ăn uống tại nhà hàng – k...

Nâng cao hiệu quả sản xuất kinh doanh tại công ty tnhh quốc tế khánh sinh

phát triển chiến lược kinh doanh nghỉ dưỡng

Hoàn thiện chính sách cổ tức tại công ty cổ phần fpt

Nâng cao chất lượng hoạt động môi giới tại công ty chứng khoán

Phân tích và hoàn thiện cấu trúc vốn tại công ty cổ phần hoàng anh gia lai

Phân tích tình hình tài chính của công ty cổ phần tập đoàn hòa phát

BÀI MẪU Khóa luận phân tích báo cáo tài chính, HOT

Luận văn: Nâng cao sự hài lòng của khách hàng tại ngân hàng, HAY

Phân tích tình hình tài chính tại công ty cổ phần bánh kẹo hải hà

Đề tài: Đánh giá hiệu quả hoạt động Digital Marketing của công ty TNHH công n...

Luận văn: Phân tích tình hình sản xuất của công ty thủy sản, HAY

Giải pháp nâng cao hiệu quả hoạt động marketing mix tại khách sạn sheraton ...

Similar to Dadm (lys)

laptop price prediction presentationNeerajNishad4

1440 track 2 boire_using our laptopRising Media, Inc.

Customer choice probabilitiesAllan D. Butler

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation

Predictive Analytics Project in Automotive IndustryMatouš Havlena

Deep learningArun Shukla

The 8 Step Data Mining ProcessMarc Berman

Egypt hackathon 2014 analytics & spss sessionM Baddar

ML-Unit-4.pdfAnushaSharma81

Internship Presentation.pdfvishwajeetparmar1

Resume anh chu data analystANH CHU

Week 12 Dimensionality Reduction Bagian 1khairulhuda242

Experimental Design for Distributed Machine Learning with Myles BakerDatabricks

Barga Galvanize Sept 2015Roger Barga

BIG MART SALES PREDICTION USING MACHINE LEARNINGIRJET Journal

Module Overview Careers in Analytics In this module, we .docxaudeleypearl

Module Overview Careers in Analytics In this module, we .docxroushhsiu

Pricing like a data scientistMatthew Evans

Data processingAnupamSingh211

AnalyticsVishnu Rajendran C R

Similar to Dadm (lys) (20)

laptop price prediction presentation

1440 track 2 boire_using our laptop

Customer choice probabilities

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...

Predictive Analytics Project in Automotive Industry

Deep learning

The 8 Step Data Mining Process

Egypt hackathon 2014 analytics & spss session

ML-Unit-4.pdf

Internship Presentation.pdf

Resume anh chu data analyst

Week 12 Dimensionality Reduction Bagian 1

Experimental Design for Distributed Machine Learning with Myles Baker

Barga Galvanize Sept 2015

BIG MART SALES PREDICTION USING MACHINE LEARNING

Module Overview Careers in Analytics In this module, we .docx

Pricing like a data scientist

Data processing

Analytics

Recently uploaded

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

"ML in Production",Oleksandr BaganFwdays

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Key Features Of Token Development (1).pptxLBM Solutions

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Artificial intelligence in the post-deep learning eraDeakin University

AI as an Interface for Commercial BuildingsMemoori

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

"ML in Production",Oleksandr Bagan

Unraveling Multimodality with Large Language Models.pdf

Key Features Of Token Development (1).pptx

Connect Wave/ connectwave Pitch Deck Presentation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Dev Dives: Streamline document processing with UiPath Studio Web

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Benefits Of Flutter Compared To Other Frameworks

Vertex AI Gemini Prompt Engineering Tips

Artificial intelligence in the post-deep learning era

AI as an Interface for Commercial Buildings

SQL Database Design For Developers at php[tek] 2024

Science&tech:THE INFORMATION AGE STS.pdf

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

SIP trunking in Janus @ Kamailio World 2024

Nell’iperspazio con Rocket: il Framework Web di Rust!

"Debugging python applications inside k8s environment", Andrii Soldatenko

Dadm (lys)

1. Data Analysis for Decision Making App Rating Prediction K. Lakshmidhar- 1818001120013 Y. Yaswanth-1818001120005 M.B Sameeri-181801120004

2. Contents • Business Objective – Problem Statement • Solution Methodology • Data Preparation and Consolidation • Key Challenges • Result • Conclusion

3. The Google Play Store team is about to launch a new feature wherein, certain apps that are promising, are boosted in visibility. The boost will manifest in multiple ways including higher priority in recommendations sections (“Similar apps”, “You might also like”, “New and updated games”). These will also get a boost in search results visibility. This feature will help bring more attention to newer apps that have the potential. Make a model to predict the app rating, with other information about the app provided. Problem Statements Objectives Business Objective – Problem Statement

4. Solution Methodology Outlier Treatment with IQR Data Cleaning Data Visualization Loss Calculation Removing Skewness in the data Model Building Solution Methodology Technology Enablers Realized Objectives NumPy Pandas Plotly Seaborn Cufflinks Sklearn Prediction Model With minimum error

5. Data Preparation and Consolidation • Data frame has 10841 rows and 13 distinct columns. The dependent variable is "Rating." We detect an inappropriate rating in our data while visualizing the numbers of "Rating." Apps are generally rated between 0 and 5 stars. • Removing unwanted variables with no proper correlation with the data Last updated, Current Ver, Android Ver • Transforming the data into a numerical form to view it. The conversion is completed by replacing all of the strings and translating them into the numerical format in various methods. • The process of cleaning the data has been done by imputing all the null values with median. As well as dropping the rows which are greater than then interquartile range so that to avoid the outlier which has been occurred previously

6. Data Preparation and Consolidation (Results) After Removal of outliers through Inter Quartile Range

7. Visualization

8. Key Challenges • Imputing null values in the dependent variable with median so as to reduce the bias in the final prediction model (Imputing with mean increased deviation in data) • Removing unwanted characters and converting to respective datatype • Plotting different data using various plots like box, bar, Cumulative distribution plots for each and collection of attributes to identify outliers and distribution of data in a given range. • Outlier Treatment using Inter-Quartile-Range and reducing skewness using log transformation removing non-linearity • Availability of less data for training the model thereby identifying minimal patterns for prediction • Comaparing the results of all the algorithms and to determine the best model which is quite ambigious on the data with good accuracy and an increased error and vice versa

9. Key Challenges (Results) Comparison between Actual and Predicted Values Cumulative Distribution Plots after log-transformation

10. Conclusion Project Plan and Support Required Results • OLS Regression gives better R-2 Score with 0.98 over testing data but may not be used for prediction as statistical techniques are ambiguous on new data with a minimal error rate of 12%. • Secondly, polynomial regression gives an R2- Score of 0.24 but RMSE is large sue to non-linearity of data. • Bagging technique Decision Tree is unfit for prediction of rating with less accuracy and more error. • Further Hyperparameter tuning might improve SVR accuracy as it R2-Score is 0.08 and Error Rate is 0.9 which require better computation power • Overall OLS Regression and Support Vector regressor proved to be better counter parts to predict the dependent variable but training huge data is essential to learn patterns as the error rate would decrease

11. Thank You

Dadm (lys)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dadm (lys)

Similar to Dadm (lys) (20)

Recently uploaded

Recently uploaded (20)

Dadm (lys)