Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Science of hit song
1. Science of Hit Songs
P.Sridhar FT181055
Tarun Siddharth FT181095
Nitin Uppal FT181054
Prakhar Srivastav FT182063
Ashutosh Singh FT181027
2. Motivations
• The ability to rationalize the choice of songs based on specific features elates our curious
minds.
• Features related to the basic components of the audio spectrum, and how each changes
over time. For example, the number of high or low frequencies in a sound determines its
perceived "brightness."
• It seems that brightness has a big influence and how the time between beats evolves
throughout a song.
• Theoretically popular songs share a certain set of features that make them appealing to
the majority of people, and we can test any new song against those success markers to
predict its commercial potential.
• It's important to note that hit-predicting algorithms tend to leave out certain crucial
factors for determining a song's commercial success, like marketing budget, the social
mood, the music video, or artist name recognition.
3. Problem Statement-:We wish to predict whether or not a song will make it to the Top 10.
Data Set and Insights: UCI, Kaggle and Advanced Business Analytics Lectures
Model: logistic regression model with all the remaining variables and Top10 as a dependant
variable and identify the significant variables
Method – We use the subset function to split the data into a training set "SongsTrain" consisting of
all the observations up to and including 2009 song releases, and a testing set "SongsTest",
consisting of the 2010 song releases.
The training set has 7201 observations, which can be found by looking at the structure with
str(SongsTrain).
We will only use the variables in our dataset that describe the numerical attributes of the song in
our logistic regression model. So we won't use the variables "year", "songtitle", "artistname",
"songID" or "artistID".
4. 1.Amazon/iTunes/etc. use collaborative
filtering.
2. Hit Song Science – clusters a provided set
of songs against a database of top 30 hits to
predict success.
3.Relatable – musical “fingerprint”
technology
4.Hitwizard takes into account the various
sound parameters of a song (like BPM,
valence, tempo) and compares them against
airplay data sourced from Dutch radio
stations and the local Spotify charts.
• Year –Which year had maximum number of
songs released and How many?
• Which year saw the maximum number of
songs in Top 10 and How many?
• Pitch and Loudness - Generate a scatter plot
of Pitch and Loudness.
• Derive inference can you derive from this
plot.
• Use different models to classify the
top 10 songs.
• Cross Validate efficiency of each
model
Current research Our Approach
5. Variables
The variables included in the dataset either describe the artist or the song, or they are associated with the following song
attributes: time signature, loudness, key, pitch, tempo, and timbre.
year = the year the song was released
songtitle = the title of the song
artistname = the name of the artist of the song
songID and artistID = identifying variables for the song and artist
timesignature and timesignature_confidence = a variable estimating the time signature of the song, and the
confidence in the estimate
loudness = a continuous variable indicating the average amplitude of the audio in decibels
tempo and tempo_confidence = a variable indicating the estimated beats per minute of the song, and the confidence
in the estimate
key and key_confidence = a variable with twelve levels indicating the estimated key of the song (C, C#, . . ., B), and the
confidence in the estimate
energy = a variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch = a continuous variable that indicates the pitch of the song
timbre_0_min, timbre_0_max, timbre_1_min, timbre_1_max, . . . , timbre_11_min, and timbre_11_max = variables
that indicate the minimum/maximum values over all segments for each of the twelve values in the timbre vector
(resulting in 24 continuous variables)
Top10 = a binary variable indicating whether or not the song made it to the Top 10 of the Billboard Hot 100 Chart (1 if
it was in the top 10, and 0 if it was not)
6. Other Analytical concept
Naive Bayes
The naive Bayes classifier estimates the probability of a hit or non-hit based on the assumption that the features are
conditionally independent. This conditional independence assumption is represented by equation
• whereby each attribute set x = {x1, x2, . . . , xN } consists of M attributes.
• Because of the conditional dependence assumption, the class-conditional probability for every combination of x does
not need to be calculated.
• Only the conditional probability of each xi given Y has to be estimated.
This offers a practical advantage since a good estimate of the probability can be obtained without the need for a very
large training set.
Naive Bayes classifies a test record by calculating the posterior probability for each class Y
Although this independence assumption is generally a poor assumption in practice, numerous studies prove that naive
Bayes competes well with more sophisticated classifiers. In particular, naive Bayes seems to be particularly resistant to
isolated noise points and robust to irrelevant attributes,
7. Advanced Thought process
• Language processing can include the distinct characteristics of every
language that are influenced by culture and history.
• Data mining can provide word clouds which can be further aligned with
beats and other sounds to get complete songs with a click of button.
• This will take music creation to new level and only thing that will create
distinction is emotions of the singer, which is the next target of machine
learning.
8. The pitch and Energy scatterplot shows that the
top songs have pitch element on the lower side but
energy side varies from 0 to 1.Hence it seems pitch
is more important deciding factor compared to
energy.
The Pitch and loudness scatterplot shows that
pitch should be confined to lower side and
loudness is on the higher side.
Scatter Plot for preliminary Investigation
9. Random Forest(Inspection of important Variables)
The Mean Decrease Gini Index shows that Timbre_0 is the most
important variable while determining the song is hit or not
10. Classication Techniques
A total of four models were built for each dataset using diverse classification techniques
1.Tree
The tree data structure consists of decision nodes and leaves. The class value is specified by the
leaves, in this case hit or non-hit, and the nodes specify a test of one of the features. When a path
from the node to a leave is followed based on the feature values of a particular song, a predictive
rule can be derived
A “divide and conquer" approach is used by the algorithm to build trees recursively .This is a top
down approach, in which a feature is sought that best separates the classes, followed by pruning
of the tree
Here also Timbre_0 is shown as important determining factor
11.
12. Logistic Regression
Equation shows the output of a logistic regression, whereby fhit(si) represents the probability
that a song i with M features xj is a dance hit. This probability follows a logistic curve, as can
be seen in the above figure. The cut-opoint of 0.5 will determine if a song is classified as a hit
or a non-hit. With AUC = 0.65 for train dataset and AUC=0.67 for test dataset .
13. Logistic
accuracy
[1] 0.8553649
CART
accuracy
[1] 0.8632849
Radom Forest
accuracy
[1] 0.8678107
Naive Bayes Classifier
[1] 0.8306619
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 1890 265
1 46 70
Accuracy : 0.8631
95% CI : (0.8482, 0.8769)
No Information Rate : 0.8525
P-Value [Acc > NIR] : 0.08115
The highest accuracy is exhibited by
Radom Forest
Comparison of Different Models
14. Multiple models were built that can successfully predict if a song is going to be a top 10 hit versus a lower
positioned song. The original dataset has been extracted from Echo nest and the dataset used here is a part
of the larger, original data set . Standard audio features were used, as well as more advanced features that
capture the temporal aspect. This resulted in a model that could accurately predict top 10 hits. This
research proves that popularity of songs can be learnt from the analysis of music signals.
Finally, by comparing different classifiers that have significantly different results in performance, the best
model could be selected.
Here the best model came out to be Random Forest with highest Accuracy
Conclusion