Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics

Visualizing music artists’ main topics and overall sentiment by
analyzing lyrics
Abstract
This paper discusses the process of visualizing
music artists’ main topics and overall sentiment
by analyzing lyrics. While artists themselves
translate their lyrics into sound, we are chal-
lenged by the problem of visualizing their songs
and automatically generating a moodboard based
on written songs. This forms the main problem
of this article, because visualized lyrics can be
valuable for deaf people, and analyzing them
may be of commercial use for music software
companies. The approach to this problem in-
volves scraping the lyrics from the web, pre-
processing the scraped texts and analysis of top-
ics and overall sentiment. Furthermore, based on
the main topics, corresponding images are
scraped from the web and placed into the mood-
board. Based on the sentiment score, a matching
level of saturation is given to these images. Fi-
nally, some results are given and discussed fol-
lowed by a link to the final application.
1 Introduction
Every music artist is unique, just like every song
is unique. Songs can express positive or negative
emotions through their sounds and lyrics. While
humans can comprehend the emotions of a song
by simply listening to it, a different approach is
required for a Natural Language Processing sys-
tem to perform this task, since only the textual
component of lyrics can be utilized. Without
sound, it is more difficult to extract the sentiment
of a song, since all that is left are words. Another
problem is that lyrics do not follow the same
syntactic rules as informative texts. The language
structure that is adhered to in lyrics is more simi-
lar to the structure of poetry. Lyrics can be am-
biguous, since they often contain metaphors, idi-
oms and polysemous words; the interpretation is
left to the listener of the song, and can be inter-
preted differently by different people. To enable
an automated system to interpret lyrics is there-
fore a demanding task. However, lyrics might be
able to provide an automated system enough in-
formation about a song to detect its main topics
and overall sentiment. Therefore, in this research
is analyzed whether it is possible to use Natural
Language Processing in order to extract the sen-
timent of songs based on their lyrics only. It is
also analyzed how to extract the main topics of
an artist’s songs, that are embedded in the lyrics.
The challenge and main problem statement of
this paper is to express the extracted sentiment
and topics of the songs of an artist in a different
way than sound: visually, by generating a mood-
board. A moodboard as visualization method is
chosen because it has the convenience of carry-
ing the same ambiguity as a song, which makes it
a perfect fit for the domain.
The resulting moodboard of an artist’s main top-
ics and overall sentiment can be useful for sever-
al purposes. It would offer an opportunity for
deaf people to comprehend music by taking ad-
vantage of their visual senses. This can bring
them closer to a domain that they are often dis-
tanced from. Besides that, these moodboards
could also be useful for music software compa-
nies such as Spotify. Spotify is well aware of the
fact that music and mood are intertwined, and
anticipate on that by offering playlists based on
moods. An addition to these playlists would be
embedding the proposed moodboard of an art-
ist’s topics and sentiment, for an additional visu-
al sensation.
Kayleigh Beard
Vrije Universiteit
Amsterdam, Nederland
k.l.beard@student.vu.nl
Anita Tran
Vrije Universiteit
a.v.t.t.tran@student.vu.nl
Nathalie Post
Vrije Universiteit
n3.post@student.vu.nl
Laila van Ments
Vrije Universiteit
l.ments@student.vu.nl

2 Data Used
2.1 Lyrics
The used data consists of lyrics from the website
‘http://www.songteksten.nl’. However, no data
is stored within the application itself. A user of
the application can query an artist, and only after
this query the data is retrieved by scraping all the
artist’s lyrics from ‘http://www.songteksten.nl’,
using ‘Scrapy’, a Python module for scraping
websites.
2.2 Images
Based on the queried artist, and the results of the
topic analysis, images are scraped from photo-
sharing website Flickr, ‘http://www.flickr.com’.
3 Methodology
This section describes the four separate steps that
take place within the application. After the lyrics
are scraped, the lyrics are preprocessed, as de-
scribed in Section 3.1. Subsequently, the main
topics of the lyrics are extracted (Section 3.2)
and sentiment analysis is conducted (Section
3.3). Finally, the results from the topic analysis
and sentiment analysis are used in order to gen-
erate a moodboard (Section 3.4).
3.1 Preprocessing the text
The scraped lyrics are preprocessed in order to
normalize the text. The first step of prepro-
cessing consists of sentence segmentation and
tokenization. After this, Part-Of-Speech tagging
is applied, in which each word of the text is
marked with a part of speech, based on its defini-
tion and context. Finally, redundant characters
and punctuation marks (colons, apostrophes, hy-
phens, strokes, parentheses and square brackets)
are removed from the text (Fokkens, 2015).
3.2 Topic analysis
The most meaningful words of a sentence are
keywords. In order to extract the main topics
from lyrics, the goal was to filter out the key-
words from the lyrics. Keywords are most often
contained in nouns, and therefore only nouns
were extracted from the text (Common Noun,
Proper Noun, Proper Noun Singular Form and
Proper Noun Plural Form), and stored in a list.
Not all the extracted nouns in this list are actual
keywords, and some of the extracted nouns
should not be part of the topic analysis. The
NLTK ‘Stopwords’ corpus is used in order to
filter out these words. However, the ‘Stopwords’
corpus does not filter out all redundant words,
especially the domain dependent words (such as
‘chorus’, and ‘verse’). During a thorough test
procedure of the application, an additional stop-
words list was generated in order to filter out
those redundant words as well.
After a list of proper keywords is established, a
frequency distribution of the most common
words in this list is generated. From this frequen-
cy distribution the ten most common words are
assembled, which represent the ten main topics
of the queried artist (Bird, S., Klein, E. and Lop-
er, E., 2009).
3.3 Sentiment Analysis
Several approaches can be used for sentiment
analysis (Maks, 2015). In this application, a po-
larity lexicon is used to assess the overall senti-
ment, by determining how many positive and
how many negative words appear in the lyrics
(Breen, 2011). The used algorithm is based on
the approach of F. Alba (Alba, 2012), and com-
prises of a few steps.
First, every word in the lyrics is compared to the
words in the opinion lexicon, which contains a
set of ‘positive polarity words’ and a set of ‘neg-
ative polarity words’. When a word contained in
the lyrics matches a word in the lexicon, it is an-
notated with a tag according to its polarity: posi-
tive or negative. However, this is not sufficient,
since this lexicon does not account for incre-
menters (‘very’, ‘super’) and decrementers
(‘barely’, ‘little’), which enhance or decrease the
strength of the sentiment. Besides that, inverters
(‘not’, ‘no’), invert the entire polarity of the
word, changing it from positive to negative or
vice versa. Therefore, additional dictionaries for
incrementers, decrementers and inverters are uti-
lized.
After every word is annotated with either a sen-
timent tag or none, the application keeps track of
two separate sentiment scores. One score keeps
track of all the positively classified words, the
other one of all the negatively classified words.
However, before a sentiment score is assigned to
a word, the previous tokens are checked for in-
crementers (in which case, the sentiment score
for that word is doubled), decrementers (in which
case, the sentiment score for that word is halved),
and inverters (in which case, the sentiment score
for that word is inverted).

The results of the sentiment analysis consist of
two scores: one positive score, and one negative
score. These resulting scores are not interpretable
without scaling. Some artists have many songs
and therefore many available lyrics, which re-
sults in higher scores than artists with fewer lyr-
ics. Therefore, the resulting sentiment scores are
scaled according to the amount of sentiment
tagged words, resulting in a percentage of posi-
tive sentiment carrying words and a percentage
of negative sentiment carrying words.
3.4 Moodboard Generation
The results of the topic- and sentiment analysis
are visualized in a moodboard. This moodboard
generation consists of two steps. First, for each
of the ten main topics determined in the topic
analysis, five corresponding images are scraped
from Flickr, as well as five images of the artist.
Second, the resulting sentiment score from the
sentiment analysis is translated into the amount
of saturation in the moodboard. The higher the
sentiment score (thus, the more positive words in
the songs), the higher the amount of saturation in
the pictures. The lower the sentiment score (thus,
the more negative words in the songs), the lower
the amount of saturation in the pictures.
4 Results
The resulting application from this research is
able to generate a moodboard, based on topic
analysis and sentiment analysis of the queried
artist’s lyrics. To assess the performance of a
Natural Language Processing system, often a
quantitative analysis is used. However, since
there was no annotated dataset of lyrics, this was
impossible. Therefore, in order to determine the
performance of the system, a manual approach
was used, in which a range of different artists
was queried. A handful of results from these que-
ries are attached in Appendix I. It is difficult to
establish a valid performance measure of these
results. However, the fact that it is not possible to
provide an exact performance measure is not ex-
tremely relevant for the purpose of the applica-
tion. The resulting topic- and sentiment analysis
of the application are visualized in a moodboard,
and moodboards are not an exact science. The
purpose of a moodboard is to project an overall
mood in a visual way, so even when the accuracy
of the topic analysis and sentiment analysis are
below optimal, this isn’t immediately observable
in the moodboard. Whether the moodboard por-
trays an artist’s sentiment and topics accurately
is almost as ambiguous as the lyrics themselves,
and is left to the interpretation of the user.
5 Discussion
There are many parts of the developed applica-
tion that can be improved. A few proposed future
improvements are described in this section.
As stated before, lyrics do not follow the same
syntactic structure such as informative texts,
which makes it a challenging task for a Natural
Language Processing system to correctly deter-
mine the Part-Of-Speech of each word. Words
were not always assigned the correct Part-Of-
Speech, which resulted in words being incorrect-
ly identified as nouns. This sometimes led to
non-keywords being identified as keywords, and
incorrectly projected on the moodboard. Even
though this misclassification often isn’t visible
due to the ambiguity of the moodboard, it is a
flaw in the application. A domain specific Part-
Of-Speech tagger could be useful in order to re-
solve this problem.
Also, even though the lexicons used for the sen-
timent analysis were extensive, they were not
domain specific, which can lead to inaccurate
results. Besides that, the positive polarity lexicon
consisted of 2002 words, while the negative po-
larity lexicon consisted of 4767 words. It has not
been researched for the purposes of this applica-
tion whether this ratio is a correct representation
of sentiment carrying words in English language.
If not, it is possible that the used lexicons result
in a bias toward the negative polarity lexicon.
Therefore, research about the correct ratio of
positive and negative polarity carrying words is
necessary to improve the accuracy.
Furthermore, only a rule-based approach was
used for this application. It could be useful to
explore whether machine learning or hybrid ap-
proaches would yield better results.
Finally, this application is a ‘stand-alone’ appli-
cation right now, but if it would be integrated in
a music software system such as Spotify, it
would be a great improvement if the application
could be personalized. Different people interpret
songs in a different way, so an addition to the
system would be an opportunity for the user to

provide feedback about the resulting visualiza-
tions. This way, the parameters of the application
could be tuned according to the user, which
could lead to a better user experience.
6 Link to Application
The application is not published online, however,
the code is made available to download at
https://www.dropbox.com/s/241gr8r9cglf8eg/Vis
ual_Songs.zip.
7 Group Work Summary
All group members brainstormed together about
the idea and worked their way through the lab
sessions. The actual application was mostly built
by Kayleigh and Nathalie, because they have
more experience with programming in Python
than Laila and Anita. Nathalie was responsible
for the scraper, the web-application using Python
CGI, and the sentiment analysis. Kayleigh was
responsible for the topic analysis and the mood-
board with images scraped from Flickr. The final
report was written by Laila, Anita, Kayleigh and
Nathalie together.
8 References
Alba, F. “Basic Sentiment Analysis in Python”. 1
Nov. 2012. Web. 23 Mar. 2015.
<http://fjavieralba.com/basic-sentiment-
analysis-with-python.html>.
Breen, J. “Twitter Sentiment Analysis Tutorial
201107: Opinion Lexicon English”. Git
hub. 12 Jul. 2011. Web. 23 Mar. 2015.
<https://github.com/jeffreybreen/twitter-
sentiment-analysis-tutorial-201107/tree
/master/data/opinion-lexicon-English>.
Bird, S., Klein, E. and Loper, E. Natural Langu
age Processing with Python, 79-128.
First Edition (2009). California: O’Reilly
Media Inc. Web.
Fokkens, A. “Introduction to NLP”. Blackboard
Learn VU. Web, Lecture 10 Feb. 2015.
Maks, I. “Text Mining 2015: Sentiment Analysis
& Opinion Mining”. Blackboard Learn
VU. Web, Lecture 24 Feb. 2015.

Spice Girls
The results of the sentiment analysis of Spice Girls are: 68 percent of classified words was positive,
and 31 percent of the words was negative.
The main topics of Spice Girls are: time, love, come, something, night, fun, baby, lover, deeper.
The resulting moodboard is displayed in image 1.
Image 1: Resulting moodboard for the Spice Girls.
Ellie Goulding
The results of the sentiment analysis of Ellie Goulding are: 43 percent of classified words was posi-
tive, and 56 percent of the words was negative.
The main topics of Ellie Goulding are: burn, love, time, baby, heart, fire, lights, life, anything.
Image 2: Resulting moodboard for Ellie Goulding.
Appendix I

Slipknot
The results of the sentiment analysis of Slipknot are: 23 percent of classified words was positive,
The main topics of Slipknot are: inside, build, fuck, life, everything, end, goodbye, man, eyes.
Image 3: Resulting moodboard for Slipknot.
ABBA
The results of the sentiment analysis of ABBA are: 53 percent of classified words was positive,
The main topics of ABBA are: man, mother, honey, waterloo, midnight, take, elaine, nothing.
Image 4: Resulting moodboard for ABBA.

Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics

Recommended

Recommended

More Related Content

Similar to Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics

Similar to Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics (20)

Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics