SlideShare a Scribd company logo
Classification
of Spotify
song
popularity
A PROJECT ON MACHINE LEARNING
Contents
Problem
Statement
Dataset
Description
Methodology
Key Findings
Conclusion
Problem Statement
Spotify, one of the most popular platform, used by the listeners for songs and
podcasts.
Spotify uses recommendation engine to recommend tracks to the listener in the
discover weekly section according to listener's preference and popularity.
Out of the two factors, popularity of the track is the most important factor, used
by the recommendation engine because it also tells about the popular
preferences of the people, based upon various variables and the heyday period
of the track.
Dataset Description
114000 rows and 21 columns
Target Variable
Popularity – This measure vary with past released song with present
released songs because Spotify reshuffles according to monthly
listeners. It is a multiclass variable consisting of 3 categories.
In this, only those variables are taken which are affecting the target
variable. Other variable are not taken into consideration.
Key variables
Independent Variable
Continuous:
Categorical:
•Valence% – It is the positiveness of the song. Higher the value is cheerful
and euphoric, lower the value depressing and sad.
•Danceability% – How much the song can be used for dance purpose.
•Energy% – It is the amount of energy a song have
•Acoustic Ness% – It measures the use of natural instruments or
electronically made music.
•Key – It the musical notes which is used in the track, such as 0=C, 1=C#,
and so on. There are total of 12 keys present.
•Tempo – It represents the speed of the song. Higher the tempo higher
faster the song and vice-versa.
•Duration – It represents the length of the song in seconds.
•Speech ness – It represents the amount of vocals/voices present in the
song.
Methodology
Cleaning and
preparing the
data.
Feature
Engineering.
Primary
model
building.
Re-
considering
the variables
and tuning
the model.
Building the
model using
different
algorithms to
check the
stability of
the
prediction.
EDA Report
• Duplicate values, null values and typo error were present in the data.
• There are huge outliers present in the data, which is treated by converting
them into categories maintaining the balance in the classes.
• Did some Feature engineering such as clubbing, binning and rounding the
data to reduce the classes in the data.
• The target variable “Popularity” was initially in percentage 0-100%.
However, the original data description says that it is classification problem.
So, the target variable is converted from regression to multi-classification.
• The target variable was not-balanced. Oversampling technique was used to
balance the classes.
Conversion of target variable percentile
into three categories.
In the histogram below we can see than the target column has a peak at 0, which is represents no
popularity of the tracks, so, it is assigned an independent class of the variable because it will impact
the accuracy of the model. The new classes are ‘zero popularity’, ‘low popularity’, ‘high popularity’
Algorithms report
With the different algorithms, the accuracy
is not fluctuation much, represent the
stability in the prediction.
Highest Accuracy = 85.94
Lowest Accuracy = 79.7
Algorithm wise accuracy:
• Random Forest Classifier = 85.94
• Decision Tree Classifier = 79.7
• Cat Boost Classifier = 80.3
• XG Boost Classifier = 82.28
Key Findings
• There were 20 independent variables present in the data but only 8 variables were
affecting the popularity of the song.
• Valence, danceability and energy are affecting almost 50% to the popularity.
• Song Genre is one of the most important factor when comes to individual's preference or
taste of music, that recommendation engine considers. The most popular genre is
Country-Specific which consist of Country Wise language songs, indicates people love
mother tongue when it comes to songs. Apart from that most popular genre is EDM
(Electronic Dance Music) because high valence, danceability and energy.
• Medium tempo is 2x popular than any other tempo range which is between 100-140
bpm. This tempo is used in EDM, Rock and Pop music, are the most popular genres.
The importance of each column related
to the popularity
The figure in the left shows how much each feature
is affecting the target column.
Valence + danceability + energy
16.7% + 16.0% + 15.5% = 48.2%
First 8 columns or all the 20 columns is giving the
same accuracy.
Conclusion
The overall dataset was little complicated because of the difficulty of establishing the
relationship between the target variable and the independent variables. However, with
some cleaning and feature engineering, the final model was stable with high accuracy.
The most difficult differentiation was that, the popularity was getting affecting by the
release date of the song and the release date was not available in the data, so it seemed
like the case of Endogeneity. Nonetheless, After separating the target variable, it got
sorted.
As a Data Scientist, I can conclude that this trained model with the following dataset is
predicting accurately and is ready for deployment in the Spotify recommendation Engine,
to predict the right popularity in future recommending the right tracks to the listeners .

More Related Content

Similar to Prediction of Spotify song popularity.pdf

survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...
Suraj Ligade
 
Jackdaw research music survey report
Jackdaw research music survey reportJackdaw research music survey report
Jackdaw research music survey report
Jan Dawson
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Competitor analysis of Music Streaming Services
Competitor analysis of Music Streaming ServicesCompetitor analysis of Music Streaming Services
Competitor analysis of Music Streaming Services
Tiffany Sam
 
Group discussion- Netease Cloud Music
Group discussion- Netease Cloud MusicGroup discussion- Netease Cloud Music
Group discussion- Netease Cloud Music
Xuanting ZHANG
 
Spotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression ModelsSpotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression Models
IRJET Journal
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
Keunwoo Choi
 
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETYANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
saran2011
 
Enfuse_QS.pdf
Enfuse_QS.pdfEnfuse_QS.pdf
Enfuse_QS.pdf
ElioLaureano1
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
AndriaLesane
 
Digital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap RadioDigital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap Radio
Handan Selcuk
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
AnkitaVerma776806
 
Assignment Grading Rubric Course MT460 Uni.docx
Assignment Grading Rubric Course MT460                  Uni.docxAssignment Grading Rubric Course MT460                  Uni.docx
Assignment Grading Rubric Course MT460 Uni.docx
rock73
 
Emofy
Emofy Emofy
music recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Datasetmusic recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Dataset
SandipKumarPratihari
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
Lem Lem
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systems
Valerio Velardo
 
June 2018 RTG presentation
June 2018 RTG presentationJune 2018 RTG presentation
June 2018 RTG presentation
Julia Stelman
 
Understanding and interpreting the report findings
Understanding and interpreting the report findingsUnderstanding and interpreting the report findings
Understanding and interpreting the report findings
Hoem Seiha
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 

Similar to Prediction of Spotify song popularity.pdf (20)

survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...
 
Jackdaw research music survey report
Jackdaw research music survey reportJackdaw research music survey report
Jackdaw research music survey report
 
Statistics
StatisticsStatistics
Statistics
 
Competitor analysis of Music Streaming Services
Competitor analysis of Music Streaming ServicesCompetitor analysis of Music Streaming Services
Competitor analysis of Music Streaming Services
 
Group discussion- Netease Cloud Music
Group discussion- Netease Cloud MusicGroup discussion- Netease Cloud Music
Group discussion- Netease Cloud Music
 
Spotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression ModelsSpotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression Models
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETYANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
 
Enfuse_QS.pdf
Enfuse_QS.pdfEnfuse_QS.pdf
Enfuse_QS.pdf
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
 
Digital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap RadioDigital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap Radio
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
 
Assignment Grading Rubric Course MT460 Uni.docx
Assignment Grading Rubric Course MT460                  Uni.docxAssignment Grading Rubric Course MT460                  Uni.docx
Assignment Grading Rubric Course MT460 Uni.docx
 
Emofy
Emofy Emofy
Emofy
 
music recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Datasetmusic recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Dataset
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systems
 
June 2018 RTG presentation
June 2018 RTG presentationJune 2018 RTG presentation
June 2018 RTG presentation
 
Understanding and interpreting the report findings
Understanding and interpreting the report findingsUnderstanding and interpreting the report findings
Understanding and interpreting the report findings
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Prediction of Spotify song popularity.pdf

  • 3. Problem Statement Spotify, one of the most popular platform, used by the listeners for songs and podcasts. Spotify uses recommendation engine to recommend tracks to the listener in the discover weekly section according to listener's preference and popularity. Out of the two factors, popularity of the track is the most important factor, used by the recommendation engine because it also tells about the popular preferences of the people, based upon various variables and the heyday period of the track.
  • 4. Dataset Description 114000 rows and 21 columns Target Variable Popularity – This measure vary with past released song with present released songs because Spotify reshuffles according to monthly listeners. It is a multiclass variable consisting of 3 categories. In this, only those variables are taken which are affecting the target variable. Other variable are not taken into consideration.
  • 5. Key variables Independent Variable Continuous: Categorical: •Valence% – It is the positiveness of the song. Higher the value is cheerful and euphoric, lower the value depressing and sad. •Danceability% – How much the song can be used for dance purpose. •Energy% – It is the amount of energy a song have •Acoustic Ness% – It measures the use of natural instruments or electronically made music. •Key – It the musical notes which is used in the track, such as 0=C, 1=C#, and so on. There are total of 12 keys present. •Tempo – It represents the speed of the song. Higher the tempo higher faster the song and vice-versa. •Duration – It represents the length of the song in seconds. •Speech ness – It represents the amount of vocals/voices present in the song.
  • 6. Methodology Cleaning and preparing the data. Feature Engineering. Primary model building. Re- considering the variables and tuning the model. Building the model using different algorithms to check the stability of the prediction.
  • 7. EDA Report • Duplicate values, null values and typo error were present in the data. • There are huge outliers present in the data, which is treated by converting them into categories maintaining the balance in the classes. • Did some Feature engineering such as clubbing, binning and rounding the data to reduce the classes in the data. • The target variable “Popularity” was initially in percentage 0-100%. However, the original data description says that it is classification problem. So, the target variable is converted from regression to multi-classification. • The target variable was not-balanced. Oversampling technique was used to balance the classes.
  • 8. Conversion of target variable percentile into three categories. In the histogram below we can see than the target column has a peak at 0, which is represents no popularity of the tracks, so, it is assigned an independent class of the variable because it will impact the accuracy of the model. The new classes are ‘zero popularity’, ‘low popularity’, ‘high popularity’
  • 9. Algorithms report With the different algorithms, the accuracy is not fluctuation much, represent the stability in the prediction. Highest Accuracy = 85.94 Lowest Accuracy = 79.7 Algorithm wise accuracy: • Random Forest Classifier = 85.94 • Decision Tree Classifier = 79.7 • Cat Boost Classifier = 80.3 • XG Boost Classifier = 82.28
  • 10. Key Findings • There were 20 independent variables present in the data but only 8 variables were affecting the popularity of the song. • Valence, danceability and energy are affecting almost 50% to the popularity. • Song Genre is one of the most important factor when comes to individual's preference or taste of music, that recommendation engine considers. The most popular genre is Country-Specific which consist of Country Wise language songs, indicates people love mother tongue when it comes to songs. Apart from that most popular genre is EDM (Electronic Dance Music) because high valence, danceability and energy. • Medium tempo is 2x popular than any other tempo range which is between 100-140 bpm. This tempo is used in EDM, Rock and Pop music, are the most popular genres.
  • 11. The importance of each column related to the popularity The figure in the left shows how much each feature is affecting the target column. Valence + danceability + energy 16.7% + 16.0% + 15.5% = 48.2% First 8 columns or all the 20 columns is giving the same accuracy.
  • 12. Conclusion The overall dataset was little complicated because of the difficulty of establishing the relationship between the target variable and the independent variables. However, with some cleaning and feature engineering, the final model was stable with high accuracy. The most difficult differentiation was that, the popularity was getting affecting by the release date of the song and the release date was not available in the data, so it seemed like the case of Endogeneity. Nonetheless, After separating the target variable, it got sorted. As a Data Scientist, I can conclude that this trained model with the following dataset is predicting accurately and is ready for deployment in the Spotify recommendation Engine, to predict the right popularity in future recommending the right tracks to the listeners .