Back to the Future: Evolution of Music Moods From 1992 to 2022
1. BACK TO THE
FUTURE
EVOLUTION OF MUSIC MOODS FROM 1992 TO 2022
MASTER OF APPLIED DATA SCIENCE: MILESTONE I PROJECT REPORT
FEBRUARY 2023 ANDRIA LESANE
PAIGE VAUTER
1
2. FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Table of Contents
BACK TO THE FUTURE: EVOLUTION OF MUSIC MOODS 1992-2022
PAGE
Executive Summary
3
The Big Idea
4
Our Approach
5
Our Insights
10
Conclusion
13
TOPIC
Statement of Work
14
2
3. BACK TO THE FUTURE: EVOLUTION OF MUSIC MOODS 1992-2022
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Executive Summary
THE BIG IDEA
FROM THE 2022 IFPI GLOBAL SURVEY OF
MUSIC ENGAGEMENT HABITS [1]:
OUR APPROACH OUR INSIGHTS
ETHICAL TAKEAWAY - PRIVACY
How would you feel if a company tracked your mood?
While we know that music impacts us
individually, we’re interested in
discovering if, collectively, we can find
mood trends in popular music over time.
For this evaluation, we will take our "Data
Science DeLorean" to look at the top 100
rated songs in the US over the last 30
years.
Data Collection
Mood Classification
Data Manipulation
Scrape the Billboard
wiki pages for top 100
songs of each year
using Wikipedia API
Retrieved audio features of each
song on the master Billboard
chart list such as "valence" and
"energy" using Spotify Web API
The study uses the 2D Emotion model to classify song moods
based on "valence" and "energy" level of each song.
A total of eight emotions was used to categorize songs: Alert,
Excited, Happy, Sad, Depressed, Relaxed, Calm, and Afraid.
Pandas python library DataFrame manipulation functions were
used to add, edit, compare, drop, and summarize 3098 total
records in our master dataset.
Data cleaning was performed manually to ensure the audio
features retrieved from Spotify were correct.
To ease processing time,
and avoid Spotify request
timeouts, the data is
exported to CSV file.
69% of respondents
Reported music is integral to their
mental and physical wellbeing.
46% of respondents
Use subscription audio streaming
services.
20.1 hours per week
Average time respondents spend
listening to music.
YOU SEEM SAD
SO I MADE YOU
A MIXTAPE <3
Would you be appreciative…or creeped out?
Our Goal
Model Fit
Through exploratory data analysis (EDA), research, and
review we found that the chosen model was a good fit and
accurately categorized the data set.
Mood Trends
Future Considerations
•
•
Overall, no specific mood trends stood out during this time
period. Changes in mood distribution in the Top 100 songs
seemed to be cyclical.
Top 100 songs are more likely to be Alert, Excited, or
Relaxed
Excited songs have dominated in the last 5 years, but
Happy songs have diminished significantly.
•
•
We found this study to be a solid foundation, lending itself to
future investigation in several areas.
Compare data by genre
Compare additional mood models or use machine learning
to build a classification model
THE FOLLOWING ATTRIBUTES WERE OBTAINED FOR EACH SONG:
•
•
•
Track title
Artist(s)
Ranking
•
•
•
Chart Year
Valence
Energy
•
•
•
Danceability
Key (pitch)
Loudness
3
4. The Big Idea
THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Our study will focus on significant moods of modern American popular
music and trends by year from 1992 to 2022.
From our analysis, we expect to understand the following:
Music is a massive part of every culture. We use
music to express ourselves, calm down, meditate,
energize or motivate ourselves, tune out noise and
distraction, and many other reasons in our daily
lives.
While we know that music impacts us individually, we’re interested in
discovering if, collectively, we can find mood trends in popular music
over time.
OBJECTIVE
The features, patterns, and anomalies present in our
master dataset.
How closely do our songs of interest follow our chosen
Music Mood Model?
How do song characteristics correlate?( i.e., artist, year
released, popularity ranking, loudness, pitch, etc.)
What are the emerging moods (if any) over the 30 year
period?
MOTIVATION
Globally, in 2022, 46% of people listened to
music using a subscription audio streaming
service [1]. As more people access large music
libraries via streaming, music classification
systems have become critical in fostering music
discovery.
One of the more popular classification methods is coined "music
moods." The music mood method provides a more accurate
recommendation system for popular music streaming apps. This
method emphasizes how users interact with music and how they feel
when listening to music. This improves the user experience when using
an audio streaming app [2].
4
5. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023 ANDRIA LESANE
PAIGE VAUTER
Data Sources
PRIMARY DATA SOURCE: SPOTIFY WEB API
Spotify is a Swedish audio streaming service provider with over
100 million songs in its music library.
With the Spotify Web API, utilizing the Tracks resource to access
both the Get Track and Get Track’s Audio Features endpoints and
obtain the following data features:
USING SPOTIFY WEB API
•
•
•
•
•
•
Track Title & Artist
Energy: represents a perceptual measure of intensity and activity.
Perceptual features contributing to this attribute include dynamic range,
perceived loudness, timbre, onset rate, and general entropy. A
standardized measure from 0.0 to 1.0.
Valence: describes the musical positiveness conveyed by a track.
Tracks with high valence sound more positive (e.g. happy, cheerful,
euphoric), while tracks with low valence sound more negative (e.g. sad,
depressed, angry). A standardized measure from 0.0 to 1.0.
Loudness: The overall loudness of a track in decibels (dB)
Key: The key the track is in. Integers map to pitches using standard
Pitch Class notation[3]. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
Danceability: describes how suitable a track is for dancing based on a
combination of musical elements including tempo, rhythm stability, beat
strength, and overall regularity. A value of 0.0 is least danceable and 1.0
is most danceable.
The Spotify Web API is available at:
https://api.spotify.com/v1 with endpoints for Track and Audio Features.
We used the Spotipy python library available at
https://spotipy.readthedocs.io/en/2.22.1/# to access the Spotify Web API.
The Spotify Web API requires authentication so we first need to setup
an access-token. The data is returned as a JSON blob that we're able to
parse through. By using the Track endpoint, we first retrieved the Track Id
and Artist for each song. We then used Track Id to query the Audio
Features endpoint to retrieve the following: Energy, Loudness, Valence,
Tempo, Danceability, and Key (pitch).
These data retrieved will be used to determine each song's mood.
Using this method, we returned track and mood data on 3098 of the 3100
songs in a list of dictionaries (one dictionary entry per track). The list was
converted to a dataframe for additional cleaning and processing.
In order to avoid sharing authentication credentials, ease processing time,
and avoid Spotify request timeouts, we saved this data in a .csv file.
USING SPOTIFY DATA
5
6. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Data Sources
SECONDARY DATA SOURCE: WIKIPEDIA API
Spotify only has top song charts going back as far as 2016. To find
historical data of top popular songs, we will look to Wikipedia.
With the Wikipedia API, we can find the URL address of Billboard Wiki
pages containing information on popular American music.
The study will specifically look at the Billboard Year-End Hot 100, the
music industry standard record chart in the United States, for songs
published yearly by Billboard magazine. Chart rankings are based on
sales (physical and digital), radio play, and online streaming in the United
States.
We will use this data set to retrieve popular songs in the United States
from 1992 - 2022.
Billboard is an American music and entertainment publication
considered a reputable industry source in music rankings.
The Billboard Year-End Chart is a chart published by Billboard that
denotes the top song of each year as determined by the publication's
weekly charts. Since 1946, Year-End charts have existed for the top
songs in various music genres, including pop, R&B, and country.
Billboard's "chart year" runs from the first week of December of the
preceding year to the final week in November.
About
USING WIKIPEDIA API
HOW RELIABLE IS WIKIPEDIA?
Wikipedia's verifiability policy requires information to be cited to a
trusted 3rd party source. Any information that is not cited is highly likely
to be deleted from Wikipedia. We can reasonably trust that information
on a Billboard wiki page is cited to the original print publication or
official website.
The English Wikipedia API is available at:
https://en.wikipedia.org/api/rest_v1/.
The Wikipedia API returns data in JSON format. By using the search
endpoint, the URL address of the target Billboard wiki page will be
obtained.
The Requests library, a HTTP request library for python, will use the
URL address to retrieve the text content of the wiki page in HTML
format.
The BeautifulSoup python library, an HTML/XML parser, will then
read chart information from Billboard wiki pages. Each Billboard Year-
End Hot 100 chart will return the following data attributes for each song:
Track Name, Artist(s), Ranking, and Chart Year.
From the scraping the wiki pages, a list of 3100 top ranked popular songs
between 1992-2022 was composed.
6
7. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Data Manipulation
DATA COLLECTION WORKFLOW
Spotify Track List
Using the dataframe of Billboard's Hot 100 list of
top songs in the US from 1992 -2002, we
searched for matching tracks in Spotify's music
library.
The search was conducted using the track's title
and the corresponding primary artist.
•
•
•
The function returned the closest search match
along with a dictionary of song attributes:
Track ID
Track's Title
Primary Artist Name
Retrieving Audio Features
The Spotify dictionary is then used to
retrieve each track's audio features (song
mood characteristics) using the track ID.
Billboard Wiki Pages - Web Scraping
A function was created to retrieve the Billboard's
Year-End Hot 100 chart of a specified year. The
function returns a pandas dataframe of the chart.
An iterative loop was created utilizing the custom
function to generate a master dataframe
containing the top 100 songs of each year from
1992 - 2022.
To capture the primary artist of each track, a split
was performed on the "Artist(s)" column of the
master billboard dataframe. Typically the
Primary Artist(s) is the first name listed on the
track with featured artists listed second.
A function was written to retrieve each
track's audio features along with the
track title and primary artist.
The function returns the information in a
pandas dataframe.
The Spotify dataframe and the Billboard
dataframe were exported into CSV
format for future data cleaning steps.
PANDAS
DATAFRAME
PYTHON
DICTIONARY
STEP 1 STEP 2 STEP 3
7
8. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Data Manipulation
MOOD CLASSIFICATION
Our Chosen Model: 2D Emotion Model
Song mood classification models
categorize a song into a particular
mood or emotional state. These
models typically use audio features
such as pitch, tempo, and timbre, as
well as lyrics and metadata, to
analyze a song and predict its
mood[4].
There are various approaches to
building these models, ranging from
supervised learning using annotated
datasets to unsupervised learning
methods such as clustering and
dimensionality reduction. Popular
moods used in classification include
happy, sad, angry, and calm.
Accurately predicting song mood
remains challenging, as different
listeners may have subjective
interpretations of the same song.
What is song mood classification?
FROM MUNOZ-DE-ESCALON, 2017 [5]
The 2D emotion model (see right) is a way of visualizing emotions in a two-
dimensional space based on two primary dimensions: valence and arousal.
Valence refers to the pleasantness or unpleasantness of an emotion, and
ranges from negative (unpleasant) to positive (pleasant). Arousal, on the
other hand, refers to the level of intensity or activation associated with an
emotion, and ranges from low (calm) to high (excited)[5].
•
•
•
The mood classification table to the left was derived from the 2D
Emotion wheel. Each song was assigned a mood based on the table. The
mood level (High, Medium, Low) correspond to the following ranges:
Low (0.0-0.32)
Medium (0.33 - 0.65)
High (0.66-1.0)
Other classification models were explored but were difficult to translate
using Spotify metrics. Some models had too many emotional tags, which
made the mood classification overly complex. Focusing on valence and
energy attributes simplified classification efforts, as standardized
values for valence and energy are available for most songs in Spotify's
music library.
Mood Valence
Energy
(Arousal)
Alert Medium High
Excited High High
Happy High Medium
Relaxed Medium Medium
Calm Medium Low
Sad Low Low
Depression Low Medium
Afraid Low High
SPOTIFY SONG MOOD CLASSIFICATION TABLE
8
9. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Data Manipulation
HANDLING MISSING/INCORRECT DATA
Dropped Data Criteria:
Songs listed in the curated Billboard List but not in Spotify were
dropped. Two songs in our dataset fell into this category which is
~0.0006% of our data.
Songs with no Mood Classification:
A limitation of our 2D Emotion model are songs with high valence and
low energy are considered "out of bounds" and do not have an assigned
mood category. Only two songs in our dataset fell under this category.
Using Spotify Search Function:
Spotify Web API search function appears to be less sophisticated than
the search system available in the mobile and web versions of the
Spotify Application.
When manually searching for records that were incorrectly matched,
the function worked better when the song title or artist name was
truncated.
Additionally, if the artist's stage name had changed since the song's
original release, using the primary artist's current stage name yielded
closer matches:
Puff Daddy (90's) Diddy (current)
CHALLENGES IN DATA MANIPULATION
Incorrect song match:
The CSV files for the Billboard top song list and Spotify list were
concatenated into a dataframe based on their shared index.
To determine if the Spotify search function pulled in the correct record,
a one-to-one boolean match of the track title and primary artist from
each data source was performed.
Some of the closest match songs pulled from the Spotify search function
were incorrect. This affected 18 records in our dataset. Each one was
manually corrected to obtain the correct audio features.
Data Source Limitations:
A point of interest was to determine if the song's genre was correlated
with mood. At the time of this study, the Spotify Web API does not
provide information on the track's genre. Genre information is currently
only available for albums and artists.
9
10. FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
Insights
MOOD MODEL EVALUATION
Mood Model Accuracy
•
•
•
•
The accuracy of the mood model was evaluated
manually for this study and solidified the confidence
in this study.
Example: "Work" by Rihanna feat. Drake
catchy, popular song to sing or dance
assumed to be a positive mood category
categorized by the model as a depressed mood
research of lyrics proved model assignment
correct
Mood Model Visual
•
•
•
The above visual also gave further insight to the
model's construction. The following findings were
interesting:
sad, relaxed, and excited songs have energy and
valence levels in the same spread but at different
ranges.
alert and happy songs are in the same range of
values, but have opposite energy and valence.
alert and depressed songs both have valence
levels just below their energy levels but overall
range is higher for alert songs.
This evaluation gave further insight to how the
model categorizes mood.
Mood Model Coverage
1.
2.
A major consideration in selecting a mood
model is to ensure the majority of songs could
be categorized. If too many variables were
considered, too many combinations of metrics
would exist to map to each mood. Model
coverage was evaluated by manually reviewing
data points, searching for blank or missing
data, and by visualizing mood assignment.
Applying the chosen mood model to the
Billboard Top 100 data provides two
considerations which can be seen in the
visualization on the right.
The data is spread across all possible moods.
No mood is missing from Top 100 charts.
The frequency with which each mood
appears in Top 100 songs varies. For example,
sad, calm, and afraid songs appear with much
less frequency than others.
Evaluating for coverage, this mood model
seems a good fit for the purposes of this study.
10
11. FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
Insights
MOOD METRICS CORRELATION EVALUATION
Mood Model Correlation
Correlation Study
A correlation study is often part of exploratory
data analysis. Calculating the correlation
between variables explains how closely they
may or may not be related. Correlation or r-
values span from -1 to +1 and explain the
strength of relationship as outlined in the table
below.
Applying a correlation study to the musical
mood and track attributes can explain if other
mood metrics are related to the chosen mood
category or should be considered in the model.
The chosen model selected just two metrics
but as mentioned, many choose more. A
heatmap, shown at right is one way to visualize
the r-values by strength.
Absolute Value Strength
> 0.7 Strong
0.5 - 0.7 Moderate
0.3 - 0.5 Weak
0 - 0.3 Very weak / none
•
•
•
•
Normally, a correlation study looks to find
strong relationships between variables,
however in this study a strong relationship
would imply that similar metrics are already
accounting for similar metrics.
Example: Loudness & Energy
a 0.67 r-value is ~strong
Energy values (according to Spotify) include
loudness in their calculation.
Including both metrics to classify mood
would overcompensate for loudness and
likely not improve the model.
Notable Data Points
Danceability & Tempo (-0.18) have a weak
negative relationship that implies that as the
danceability measure increases, the tempo
decreases. The r value is very low, and the
relationship is weak, but the tendency
towards a negative relationship is interesting
to note. The same could be said for
Danceability and Energy (-0.032).
Data points with very low correlation to Energy and
Valence, like Pitch and Tempo, may be good metrics to
consider for future model iterations or improvements.
This correlation study did not include the mood
classification itself, since it was derived directly from two
of these metrics. The goal with this study was to evaluate
which other metrics may add to the existing model.
11
12. FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
Insights
MOOD TRENDS OVER TIME
The Big Idea
•
•
•
•
•
A major goal of this study was to evaluate mood trends over time.
So after studying the model itself, we wanted to see what
patterns and trends could be seen in the data itself, over the last
30 years.
Based on this study, we made several key pattern based
conclusions:
Alert, Excited, and Relaxed songs make up the majority of Top
100 songs in the last 30 years.
The spread by mood type seems much more varied 30 years
ago than the early 2000s but appears to be returning -
patterns appear cyclical.
The last 5 - 7 years has shown unique trends:
Relaxed music has seen a surge while Excited music has
significantly dropped off.
Depressed music is becoming popular again (dropped off
around 2000).
PANDEMIC IMPLICATIONS?
Could the pandemic explain musical mood changes? Is this
why depressed and happy songs have seen an increase since
2020? We also see almost no sad or afraid songs, and fewer
relaxed songs in this period. Overall the patterns seem cyclical,
but it will be interesting in the future to assess if anything
changed as of 2020.
12
13. THE BIG IDEA ° OUR APPROACH ° OUR INSIGHTS ° CONCLUSION
FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Conclusion
POSSIBLE FUTURE CONSIDERATIONS
•
•
•
This study lends itself to several areas for expansion:
Using machine learning to develop a more
sophisticated accuracy measurement of mood
classification models.
A study could also be done that compares "top"
songs for radio versus streaming services or
another consumption model.
A predictive model could be built for
classification and more than 30 years of data
would be considered.
With all these possibilities, we're confident in the
study we conducted and it lays a solid foundation for
future analysis.
ETHICAL IMPLICATIONS
•
•
•
•
•
While the goal of music mood classification models is to improve user experience of audio
streaming services, the use of music mood classification raises several ethical concerns that
need to be carefully considered. Some of these concerns include:
Privacy: Music mood classification relies on analyzing a person's music choices, which
could be seen as an invasion of their privacy. People may not want their emotions to be
tracked or analyzed in this way, especially if the data is being used for marketing or other
purposes.
Bias and discrimination: The algorithms used for music mood classification may be
biased in their analysis, potentially leading to discrimination.
Stigmatization: Mood classification may reinforce stereotypes and stigmas about mental
health conditions or emotional states by using labels like "depressed" or "anxious".
Inaccuracies: Music mood classification is not always accurate and may make incorrect
assumptions about a person's emotional state based on their music choices.
Lack of consent: If music mood classification is used without a person's consent, it could
be seen as a violation of their autonomy and agency over their own emotional
experiences.
Overall, it is important to approach the use of music mood classification with caution and
consideration of these ethical concerns. It is important to ensure that the technology is used
in a responsible and ethical way, with transparency and accountability to users.
13
14. FEBRUARY 2023
ANDRIA LESANE
PAIGE VAUTER
Statement of Work
1.
2.
3.
4.
5.
IFPI Global Data and Analysis "Engaging with Music 2022". Report. IFPI.org.
N.Hughes, "Spotify wants to suggest songs based on your speech and
emotional state". CyberNews.com
K. Kosbar, "Pitch Names- A few more details". EE3430 Digital
Communications Course, Missouri University of Science and
Technology.https://web.mst.edu/~kosbar/ee3430/ff/fourier/notes_pitchnames.
html
Nuzzolo, M. (n.d.). Music Mood Classification | Electrical and Computer
Engineering Design Handbook. Tufts University.
https://sites.tufts.edu/eeseniordesignhandbook/2015/music-mood-
classification/
Munoz-de-Escalon, E. & Canas, J. (2017). "Online measuring of available
resources". H-Workload 2017: The first international symposium on human
mental workload, Dublin Institute of Technology.
https://doi.org/10.21427/D7DK96
REFERENCES
ANDRIA LESANE PAIGE VAUTER
Web scraping of Billboard
wiki pages, Mood
classification
research/selection, Emotion
assignment of songs, data
cleaning, sankey visualization
and report writing.
Spotify API data collection,
data cleaning, data
aggregation and
manipulation, visualizations:
heatmap, time series, stacked
bar chart, stripplot, and
scatterplot, report writing
Project collaboration was made possible though
JetBrains Datalore, a collaborative data
science platform that allows for Juptyer
Notebooks collaboration in real time.
ACKNOWLEDGEMENTS
We would like to thank our project advisor, Oleg Nikolsky, and the
rest of the instructional team of the Winter 2023 iteration of SIADS
Milestone I course for their valuable guidance and support.
NOTEBOOKS
6.
7.
A.Lesane & P.Vauter. "Back to the Future: Evolution of Music Moods from
1992 to Present-Data Collection". Juptyer Notebook
A.Lesane & P.Vauter. "Back to the Future: Evolution of Music Moods from
1992 to Present-Exploration". Juptyer Notebook
BACK TO THE FUTURE: EVOLUTION OF MUSIC MOODS 1992-2022
14