SlideShare a Scribd company logo
1 of 6
Download to read offline
Visualizing music artists’ main topics and overall sentiment by
analyzing lyrics
Abstract
This paper discusses the process of visualizing
music artists’ main topics and overall sentiment
by analyzing lyrics. While artists themselves
translate their lyrics into sound, we are chal-
lenged by the problem of visualizing their songs
and automatically generating a moodboard based
on written songs. This forms the main problem
of this article, because visualized lyrics can be
valuable for deaf people, and analyzing them
may be of commercial use for music software
companies. The approach to this problem in-
volves scraping the lyrics from the web, pre-
processing the scraped texts and analysis of top-
ics and overall sentiment. Furthermore, based on
the main topics, corresponding images are
scraped from the web and placed into the mood-
board. Based on the sentiment score, a matching
level of saturation is given to these images. Fi-
nally, some results are given and discussed fol-
lowed by a link to the final application.
1 Introduction
Every music artist is unique, just like every song
is unique. Songs can express positive or negative
emotions through their sounds and lyrics. While
humans can comprehend the emotions of a song
by simply listening to it, a different approach is
required for a Natural Language Processing sys-
tem to perform this task, since only the textual
component of lyrics can be utilized. Without
sound, it is more difficult to extract the sentiment
of a song, since all that is left are words. Another
problem is that lyrics do not follow the same
syntactic rules as informative texts. The language
structure that is adhered to in lyrics is more simi-
lar to the structure of poetry. Lyrics can be am-
biguous, since they often contain metaphors, idi-
oms and polysemous words; the interpretation is
left to the listener of the song, and can be inter-
preted differently by different people. To enable
an automated system to interpret lyrics is there-
fore a demanding task. However, lyrics might be
able to provide an automated system enough in-
formation about a song to detect its main topics
and overall sentiment. Therefore, in this research
is analyzed whether it is possible to use Natural
Language Processing in order to extract the sen-
timent of songs based on their lyrics only. It is
also analyzed how to extract the main topics of
an artist’s songs, that are embedded in the lyrics.
The challenge and main problem statement of
this paper is to express the extracted sentiment
and topics of the songs of an artist in a different
way than sound: visually, by generating a mood-
board. A moodboard as visualization method is
chosen because it has the convenience of carry-
ing the same ambiguity as a song, which makes it
a perfect fit for the domain.
The resulting moodboard of an artist’s main top-
ics and overall sentiment can be useful for sever-
al purposes. It would offer an opportunity for
deaf people to comprehend music by taking ad-
vantage of their visual senses. This can bring
them closer to a domain that they are often dis-
tanced from. Besides that, these moodboards
could also be useful for music software compa-
nies such as Spotify. Spotify is well aware of the
fact that music and mood are intertwined, and
anticipate on that by offering playlists based on
moods. An addition to these playlists would be
embedding the proposed moodboard of an art-
ist’s topics and sentiment, for an additional visu-
al sensation.
Kayleigh Beard
Vrije Universiteit
Amsterdam, Nederland
k.l.beard@student.vu.nl
Anita Tran
Vrije Universiteit
Amsterdam, Nederland
a.v.t.t.tran@student.vu.nl
Nathalie Post
Vrije Universiteit
Amsterdam, Nederland
n3.post@student.vu.nl
Laila van Ments
Vrije Universiteit
Amsterdam, Nederland
l.ments@student.vu.nl
2 Data Used
2.1 Lyrics
The used data consists of lyrics from the website
‘http://www.songteksten.nl’. However, no data
is stored within the application itself. A user of
the application can query an artist, and only after
this query the data is retrieved by scraping all the
artist’s lyrics from ‘http://www.songteksten.nl’,
using ‘Scrapy’, a Python module for scraping
websites.
2.2 Images
Based on the queried artist, and the results of the
topic analysis, images are scraped from photo-
sharing website Flickr, ‘http://www.flickr.com’.
3 Methodology
This section describes the four separate steps that
take place within the application. After the lyrics
are scraped, the lyrics are preprocessed, as de-
scribed in Section 3.1. Subsequently, the main
topics of the lyrics are extracted (Section 3.2)
and sentiment analysis is conducted (Section
3.3). Finally, the results from the topic analysis
and sentiment analysis are used in order to gen-
erate a moodboard (Section 3.4).
3.1 Preprocessing the text
The scraped lyrics are preprocessed in order to
normalize the text. The first step of prepro-
cessing consists of sentence segmentation and
tokenization. After this, Part-Of-Speech tagging
is applied, in which each word of the text is
marked with a part of speech, based on its defini-
tion and context. Finally, redundant characters
and punctuation marks (colons, apostrophes, hy-
phens, strokes, parentheses and square brackets)
are removed from the text (Fokkens, 2015).
3.2 Topic analysis
The most meaningful words of a sentence are
keywords. In order to extract the main topics
from lyrics, the goal was to filter out the key-
words from the lyrics. Keywords are most often
contained in nouns, and therefore only nouns
were extracted from the text (Common Noun,
Proper Noun, Proper Noun Singular Form and
Proper Noun Plural Form), and stored in a list.
Not all the extracted nouns in this list are actual
keywords, and some of the extracted nouns
should not be part of the topic analysis. The
NLTK ‘Stopwords’ corpus is used in order to
filter out these words. However, the ‘Stopwords’
corpus does not filter out all redundant words,
especially the domain dependent words (such as
‘chorus’, and ‘verse’). During a thorough test
procedure of the application, an additional stop-
words list was generated in order to filter out
those redundant words as well.
After a list of proper keywords is established, a
frequency distribution of the most common
words in this list is generated. From this frequen-
cy distribution the ten most common words are
assembled, which represent the ten main topics
of the queried artist (Bird, S., Klein, E. and Lop-
er, E., 2009).
3.3 Sentiment Analysis
Several approaches can be used for sentiment
analysis (Maks, 2015). In this application, a po-
larity lexicon is used to assess the overall senti-
ment, by determining how many positive and
how many negative words appear in the lyrics
(Breen, 2011). The used algorithm is based on
the approach of F. Alba (Alba, 2012), and com-
prises of a few steps.
First, every word in the lyrics is compared to the
words in the opinion lexicon, which contains a
set of ‘positive polarity words’ and a set of ‘neg-
ative polarity words’. When a word contained in
the lyrics matches a word in the lexicon, it is an-
notated with a tag according to its polarity: posi-
tive or negative. However, this is not sufficient,
since this lexicon does not account for incre-
menters (‘very’, ‘super’) and decrementers
(‘barely’, ‘little’), which enhance or decrease the
strength of the sentiment. Besides that, inverters
(‘not’, ‘no’), invert the entire polarity of the
word, changing it from positive to negative or
vice versa. Therefore, additional dictionaries for
incrementers, decrementers and inverters are uti-
lized.
After every word is annotated with either a sen-
timent tag or none, the application keeps track of
two separate sentiment scores. One score keeps
track of all the positively classified words, the
other one of all the negatively classified words.
However, before a sentiment score is assigned to
a word, the previous tokens are checked for in-
crementers (in which case, the sentiment score
for that word is doubled), decrementers (in which
case, the sentiment score for that word is halved),
and inverters (in which case, the sentiment score
for that word is inverted).
The results of the sentiment analysis consist of
two scores: one positive score, and one negative
score. These resulting scores are not interpretable
without scaling. Some artists have many songs
and therefore many available lyrics, which re-
sults in higher scores than artists with fewer lyr-
ics. Therefore, the resulting sentiment scores are
scaled according to the amount of sentiment
tagged words, resulting in a percentage of posi-
tive sentiment carrying words and a percentage
of negative sentiment carrying words.
3.4 Moodboard Generation
The results of the topic- and sentiment analysis
are visualized in a moodboard. This moodboard
generation consists of two steps. First, for each
of the ten main topics determined in the topic
analysis, five corresponding images are scraped
from Flickr, as well as five images of the artist.
Second, the resulting sentiment score from the
sentiment analysis is translated into the amount
of saturation in the moodboard. The higher the
sentiment score (thus, the more positive words in
the songs), the higher the amount of saturation in
the pictures. The lower the sentiment score (thus,
the more negative words in the songs), the lower
the amount of saturation in the pictures.
4 Results
The resulting application from this research is
able to generate a moodboard, based on topic
analysis and sentiment analysis of the queried
artist’s lyrics. To assess the performance of a
Natural Language Processing system, often a
quantitative analysis is used. However, since
there was no annotated dataset of lyrics, this was
impossible. Therefore, in order to determine the
performance of the system, a manual approach
was used, in which a range of different artists
was queried. A handful of results from these que-
ries are attached in Appendix I. It is difficult to
establish a valid performance measure of these
results. However, the fact that it is not possible to
provide an exact performance measure is not ex-
tremely relevant for the purpose of the applica-
tion. The resulting topic- and sentiment analysis
of the application are visualized in a moodboard,
and moodboards are not an exact science. The
purpose of a moodboard is to project an overall
mood in a visual way, so even when the accuracy
of the topic analysis and sentiment analysis are
below optimal, this isn’t immediately observable
in the moodboard. Whether the moodboard por-
trays an artist’s sentiment and topics accurately
is almost as ambiguous as the lyrics themselves,
and is left to the interpretation of the user.
5 Discussion
There are many parts of the developed applica-
tion that can be improved. A few proposed future
improvements are described in this section.
As stated before, lyrics do not follow the same
syntactic structure such as informative texts,
which makes it a challenging task for a Natural
Language Processing system to correctly deter-
mine the Part-Of-Speech of each word. Words
were not always assigned the correct Part-Of-
Speech, which resulted in words being incorrect-
ly identified as nouns. This sometimes led to
non-keywords being identified as keywords, and
incorrectly projected on the moodboard. Even
though this misclassification often isn’t visible
due to the ambiguity of the moodboard, it is a
flaw in the application. A domain specific Part-
Of-Speech tagger could be useful in order to re-
solve this problem.
Also, even though the lexicons used for the sen-
timent analysis were extensive, they were not
domain specific, which can lead to inaccurate
results. Besides that, the positive polarity lexicon
consisted of 2002 words, while the negative po-
larity lexicon consisted of 4767 words. It has not
been researched for the purposes of this applica-
tion whether this ratio is a correct representation
of sentiment carrying words in English language.
If not, it is possible that the used lexicons result
in a bias toward the negative polarity lexicon.
Therefore, research about the correct ratio of
positive and negative polarity carrying words is
necessary to improve the accuracy.
Furthermore, only a rule-based approach was
used for this application. It could be useful to
explore whether machine learning or hybrid ap-
proaches would yield better results.
Finally, this application is a ‘stand-alone’ appli-
cation right now, but if it would be integrated in
a music software system such as Spotify, it
would be a great improvement if the application
could be personalized. Different people interpret
songs in a different way, so an addition to the
system would be an opportunity for the user to
provide feedback about the resulting visualiza-
tions. This way, the parameters of the application
could be tuned according to the user, which
could lead to a better user experience.
6 Link to Application
The application is not published online, however,
the code is made available to download at
https://www.dropbox.com/s/241gr8r9cglf8eg/Vis
ual_Songs.zip.
7 Group Work Summary
All group members brainstormed together about
the idea and worked their way through the lab
sessions. The actual application was mostly built
by Kayleigh and Nathalie, because they have
more experience with programming in Python
than Laila and Anita. Nathalie was responsible
for the scraper, the web-application using Python
CGI, and the sentiment analysis. Kayleigh was
responsible for the topic analysis and the mood-
board with images scraped from Flickr. The final
report was written by Laila, Anita, Kayleigh and
Nathalie together.
8 References
Alba, F. “Basic Sentiment Analysis in Python”. 1
Nov. 2012. Web. 23 Mar. 2015.
<http://fjavieralba.com/basic-sentiment-
analysis-with-python.html>.
Breen, J. “Twitter Sentiment Analysis Tutorial
201107: Opinion Lexicon English”. Git
hub. 12 Jul. 2011. Web. 23 Mar. 2015.
<https://github.com/jeffreybreen/twitter-
sentiment-analysis-tutorial-201107/tree
/master/data/opinion-lexicon-English>.
Bird, S., Klein, E. and Loper, E. Natural Langu
age Processing with Python, 79-128.
First Edition (2009). California: O’Reilly
Media Inc. Web.
Fokkens, A. “Introduction to NLP”. Blackboard
Learn VU. Web, Lecture 10 Feb. 2015.
Maks, I. “Text Mining 2015: Sentiment Analysis
& Opinion Mining”. Blackboard Learn
VU. Web, Lecture 24 Feb. 2015.
Spice Girls
The results of the sentiment analysis of Spice Girls are: 68 percent of classified words was positive,
and 31 percent of the words was negative.
The main topics of Spice Girls are: time, love, come, something, night, fun, baby, lover, deeper.
The resulting moodboard is displayed in image 1.
Image 1: Resulting moodboard for the Spice Girls.
Ellie Goulding
The results of the sentiment analysis of Ellie Goulding are: 43 percent of classified words was posi-
tive, and 56 percent of the words was negative.
The main topics of Ellie Goulding are: burn, love, time, baby, heart, fire, lights, life, anything.
The resulting moodboard is displayed in image 2.
Image 2: Resulting moodboard for Ellie Goulding.
Appendix I
Slipknot
The results of the sentiment analysis of Slipknot are: 23 percent of classified words was positive,
and 76 percent of the words was negative.
The main topics of Slipknot are: inside, build, fuck, life, everything, end, goodbye, man, eyes.
The resulting moodboard is displayed in image 3.
Image 3: Resulting moodboard for Slipknot.
ABBA
The results of the sentiment analysis of ABBA are: 53 percent of classified words was positive,
and 46 percent of the words was negative.
The main topics of ABBA are: man, mother, honey, waterloo, midnight, take, elaine, nothing.
The resulting moodboard is displayed in image 4.
Image 4: Resulting moodboard for ABBA.

More Related Content

Similar to Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics

Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...IJECEIAES
 
Mehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedMehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedIRJET Journal
 
Poster vega north
Poster vega northPoster vega north
Poster vega northAcxelVega
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityijaia
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022AndriaLesane
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
 
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Kosetsu Tsukuda
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.IJSRD
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
IRJET- Feeling based Music Recommnendation System using Sensors
IRJET- Feeling based Music Recommnendation System using SensorsIRJET- Feeling based Music Recommnendation System using Sensors
IRJET- Feeling based Music Recommnendation System using SensorsIRJET Journal
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET Journal
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
 
Aspect mining and sentiment association
Aspect mining and sentiment associationAspect mining and sentiment association
Aspect mining and sentiment associationKoushik Ramachandra
 
Writing Development Of ELL Students In A Music Classroom
Writing Development Of ELL Students In A Music ClassroomWriting Development Of ELL Students In A Music Classroom
Writing Development Of ELL Students In A Music ClassroomValerie Erickson-Mesias
 

Similar to Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics (20)

auto_playlist
auto_playlistauto_playlist
auto_playlist
 
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
 
Mehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedMehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment Detected
 
Mood Detection
Mood DetectionMood Detection
Mood Detection
 
Poster vega north
Poster vega northPoster vega north
Poster vega north
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
IRJET- Feeling based Music Recommnendation System using Sensors
IRJET- Feeling based Music Recommnendation System using SensorsIRJET- Feeling based Music Recommnendation System using Sensors
IRJET- Feeling based Music Recommnendation System using Sensors
 
NLP todo
NLP todoNLP todo
NLP todo
 
Song Comparison Essay.pdf
Song Comparison Essay.pdfSong Comparison Essay.pdf
Song Comparison Essay.pdf
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music Player
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Aspect mining and sentiment association
Aspect mining and sentiment associationAspect mining and sentiment association
Aspect mining and sentiment association
 
Writing Development Of ELL Students In A Music Classroom
Writing Development Of ELL Students In A Music ClassroomWriting Development Of ELL Students In A Music Classroom
Writing Development Of ELL Students In A Music Classroom
 

Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics

  • 1. Visualizing music artists’ main topics and overall sentiment by analyzing lyrics Abstract This paper discusses the process of visualizing music artists’ main topics and overall sentiment by analyzing lyrics. While artists themselves translate their lyrics into sound, we are chal- lenged by the problem of visualizing their songs and automatically generating a moodboard based on written songs. This forms the main problem of this article, because visualized lyrics can be valuable for deaf people, and analyzing them may be of commercial use for music software companies. The approach to this problem in- volves scraping the lyrics from the web, pre- processing the scraped texts and analysis of top- ics and overall sentiment. Furthermore, based on the main topics, corresponding images are scraped from the web and placed into the mood- board. Based on the sentiment score, a matching level of saturation is given to these images. Fi- nally, some results are given and discussed fol- lowed by a link to the final application. 1 Introduction Every music artist is unique, just like every song is unique. Songs can express positive or negative emotions through their sounds and lyrics. While humans can comprehend the emotions of a song by simply listening to it, a different approach is required for a Natural Language Processing sys- tem to perform this task, since only the textual component of lyrics can be utilized. Without sound, it is more difficult to extract the sentiment of a song, since all that is left are words. Another problem is that lyrics do not follow the same syntactic rules as informative texts. The language structure that is adhered to in lyrics is more simi- lar to the structure of poetry. Lyrics can be am- biguous, since they often contain metaphors, idi- oms and polysemous words; the interpretation is left to the listener of the song, and can be inter- preted differently by different people. To enable an automated system to interpret lyrics is there- fore a demanding task. However, lyrics might be able to provide an automated system enough in- formation about a song to detect its main topics and overall sentiment. Therefore, in this research is analyzed whether it is possible to use Natural Language Processing in order to extract the sen- timent of songs based on their lyrics only. It is also analyzed how to extract the main topics of an artist’s songs, that are embedded in the lyrics. The challenge and main problem statement of this paper is to express the extracted sentiment and topics of the songs of an artist in a different way than sound: visually, by generating a mood- board. A moodboard as visualization method is chosen because it has the convenience of carry- ing the same ambiguity as a song, which makes it a perfect fit for the domain. The resulting moodboard of an artist’s main top- ics and overall sentiment can be useful for sever- al purposes. It would offer an opportunity for deaf people to comprehend music by taking ad- vantage of their visual senses. This can bring them closer to a domain that they are often dis- tanced from. Besides that, these moodboards could also be useful for music software compa- nies such as Spotify. Spotify is well aware of the fact that music and mood are intertwined, and anticipate on that by offering playlists based on moods. An addition to these playlists would be embedding the proposed moodboard of an art- ist’s topics and sentiment, for an additional visu- al sensation. Kayleigh Beard Vrije Universiteit Amsterdam, Nederland k.l.beard@student.vu.nl Anita Tran Vrije Universiteit Amsterdam, Nederland a.v.t.t.tran@student.vu.nl Nathalie Post Vrije Universiteit Amsterdam, Nederland n3.post@student.vu.nl Laila van Ments Vrije Universiteit Amsterdam, Nederland l.ments@student.vu.nl
  • 2. 2 Data Used 2.1 Lyrics The used data consists of lyrics from the website ‘http://www.songteksten.nl’. However, no data is stored within the application itself. A user of the application can query an artist, and only after this query the data is retrieved by scraping all the artist’s lyrics from ‘http://www.songteksten.nl’, using ‘Scrapy’, a Python module for scraping websites. 2.2 Images Based on the queried artist, and the results of the topic analysis, images are scraped from photo- sharing website Flickr, ‘http://www.flickr.com’. 3 Methodology This section describes the four separate steps that take place within the application. After the lyrics are scraped, the lyrics are preprocessed, as de- scribed in Section 3.1. Subsequently, the main topics of the lyrics are extracted (Section 3.2) and sentiment analysis is conducted (Section 3.3). Finally, the results from the topic analysis and sentiment analysis are used in order to gen- erate a moodboard (Section 3.4). 3.1 Preprocessing the text The scraped lyrics are preprocessed in order to normalize the text. The first step of prepro- cessing consists of sentence segmentation and tokenization. After this, Part-Of-Speech tagging is applied, in which each word of the text is marked with a part of speech, based on its defini- tion and context. Finally, redundant characters and punctuation marks (colons, apostrophes, hy- phens, strokes, parentheses and square brackets) are removed from the text (Fokkens, 2015). 3.2 Topic analysis The most meaningful words of a sentence are keywords. In order to extract the main topics from lyrics, the goal was to filter out the key- words from the lyrics. Keywords are most often contained in nouns, and therefore only nouns were extracted from the text (Common Noun, Proper Noun, Proper Noun Singular Form and Proper Noun Plural Form), and stored in a list. Not all the extracted nouns in this list are actual keywords, and some of the extracted nouns should not be part of the topic analysis. The NLTK ‘Stopwords’ corpus is used in order to filter out these words. However, the ‘Stopwords’ corpus does not filter out all redundant words, especially the domain dependent words (such as ‘chorus’, and ‘verse’). During a thorough test procedure of the application, an additional stop- words list was generated in order to filter out those redundant words as well. After a list of proper keywords is established, a frequency distribution of the most common words in this list is generated. From this frequen- cy distribution the ten most common words are assembled, which represent the ten main topics of the queried artist (Bird, S., Klein, E. and Lop- er, E., 2009). 3.3 Sentiment Analysis Several approaches can be used for sentiment analysis (Maks, 2015). In this application, a po- larity lexicon is used to assess the overall senti- ment, by determining how many positive and how many negative words appear in the lyrics (Breen, 2011). The used algorithm is based on the approach of F. Alba (Alba, 2012), and com- prises of a few steps. First, every word in the lyrics is compared to the words in the opinion lexicon, which contains a set of ‘positive polarity words’ and a set of ‘neg- ative polarity words’. When a word contained in the lyrics matches a word in the lexicon, it is an- notated with a tag according to its polarity: posi- tive or negative. However, this is not sufficient, since this lexicon does not account for incre- menters (‘very’, ‘super’) and decrementers (‘barely’, ‘little’), which enhance or decrease the strength of the sentiment. Besides that, inverters (‘not’, ‘no’), invert the entire polarity of the word, changing it from positive to negative or vice versa. Therefore, additional dictionaries for incrementers, decrementers and inverters are uti- lized. After every word is annotated with either a sen- timent tag or none, the application keeps track of two separate sentiment scores. One score keeps track of all the positively classified words, the other one of all the negatively classified words. However, before a sentiment score is assigned to a word, the previous tokens are checked for in- crementers (in which case, the sentiment score for that word is doubled), decrementers (in which case, the sentiment score for that word is halved), and inverters (in which case, the sentiment score for that word is inverted).
  • 3. The results of the sentiment analysis consist of two scores: one positive score, and one negative score. These resulting scores are not interpretable without scaling. Some artists have many songs and therefore many available lyrics, which re- sults in higher scores than artists with fewer lyr- ics. Therefore, the resulting sentiment scores are scaled according to the amount of sentiment tagged words, resulting in a percentage of posi- tive sentiment carrying words and a percentage of negative sentiment carrying words. 3.4 Moodboard Generation The results of the topic- and sentiment analysis are visualized in a moodboard. This moodboard generation consists of two steps. First, for each of the ten main topics determined in the topic analysis, five corresponding images are scraped from Flickr, as well as five images of the artist. Second, the resulting sentiment score from the sentiment analysis is translated into the amount of saturation in the moodboard. The higher the sentiment score (thus, the more positive words in the songs), the higher the amount of saturation in the pictures. The lower the sentiment score (thus, the more negative words in the songs), the lower the amount of saturation in the pictures. 4 Results The resulting application from this research is able to generate a moodboard, based on topic analysis and sentiment analysis of the queried artist’s lyrics. To assess the performance of a Natural Language Processing system, often a quantitative analysis is used. However, since there was no annotated dataset of lyrics, this was impossible. Therefore, in order to determine the performance of the system, a manual approach was used, in which a range of different artists was queried. A handful of results from these que- ries are attached in Appendix I. It is difficult to establish a valid performance measure of these results. However, the fact that it is not possible to provide an exact performance measure is not ex- tremely relevant for the purpose of the applica- tion. The resulting topic- and sentiment analysis of the application are visualized in a moodboard, and moodboards are not an exact science. The purpose of a moodboard is to project an overall mood in a visual way, so even when the accuracy of the topic analysis and sentiment analysis are below optimal, this isn’t immediately observable in the moodboard. Whether the moodboard por- trays an artist’s sentiment and topics accurately is almost as ambiguous as the lyrics themselves, and is left to the interpretation of the user. 5 Discussion There are many parts of the developed applica- tion that can be improved. A few proposed future improvements are described in this section. As stated before, lyrics do not follow the same syntactic structure such as informative texts, which makes it a challenging task for a Natural Language Processing system to correctly deter- mine the Part-Of-Speech of each word. Words were not always assigned the correct Part-Of- Speech, which resulted in words being incorrect- ly identified as nouns. This sometimes led to non-keywords being identified as keywords, and incorrectly projected on the moodboard. Even though this misclassification often isn’t visible due to the ambiguity of the moodboard, it is a flaw in the application. A domain specific Part- Of-Speech tagger could be useful in order to re- solve this problem. Also, even though the lexicons used for the sen- timent analysis were extensive, they were not domain specific, which can lead to inaccurate results. Besides that, the positive polarity lexicon consisted of 2002 words, while the negative po- larity lexicon consisted of 4767 words. It has not been researched for the purposes of this applica- tion whether this ratio is a correct representation of sentiment carrying words in English language. If not, it is possible that the used lexicons result in a bias toward the negative polarity lexicon. Therefore, research about the correct ratio of positive and negative polarity carrying words is necessary to improve the accuracy. Furthermore, only a rule-based approach was used for this application. It could be useful to explore whether machine learning or hybrid ap- proaches would yield better results. Finally, this application is a ‘stand-alone’ appli- cation right now, but if it would be integrated in a music software system such as Spotify, it would be a great improvement if the application could be personalized. Different people interpret songs in a different way, so an addition to the system would be an opportunity for the user to
  • 4. provide feedback about the resulting visualiza- tions. This way, the parameters of the application could be tuned according to the user, which could lead to a better user experience. 6 Link to Application The application is not published online, however, the code is made available to download at https://www.dropbox.com/s/241gr8r9cglf8eg/Vis ual_Songs.zip. 7 Group Work Summary All group members brainstormed together about the idea and worked their way through the lab sessions. The actual application was mostly built by Kayleigh and Nathalie, because they have more experience with programming in Python than Laila and Anita. Nathalie was responsible for the scraper, the web-application using Python CGI, and the sentiment analysis. Kayleigh was responsible for the topic analysis and the mood- board with images scraped from Flickr. The final report was written by Laila, Anita, Kayleigh and Nathalie together. 8 References Alba, F. “Basic Sentiment Analysis in Python”. 1 Nov. 2012. Web. 23 Mar. 2015. <http://fjavieralba.com/basic-sentiment- analysis-with-python.html>. Breen, J. “Twitter Sentiment Analysis Tutorial 201107: Opinion Lexicon English”. Git hub. 12 Jul. 2011. Web. 23 Mar. 2015. <https://github.com/jeffreybreen/twitter- sentiment-analysis-tutorial-201107/tree /master/data/opinion-lexicon-English>. Bird, S., Klein, E. and Loper, E. Natural Langu age Processing with Python, 79-128. First Edition (2009). California: O’Reilly Media Inc. Web. Fokkens, A. “Introduction to NLP”. Blackboard Learn VU. Web, Lecture 10 Feb. 2015. Maks, I. “Text Mining 2015: Sentiment Analysis & Opinion Mining”. Blackboard Learn VU. Web, Lecture 24 Feb. 2015.
  • 5. Spice Girls The results of the sentiment analysis of Spice Girls are: 68 percent of classified words was positive, and 31 percent of the words was negative. The main topics of Spice Girls are: time, love, come, something, night, fun, baby, lover, deeper. The resulting moodboard is displayed in image 1. Image 1: Resulting moodboard for the Spice Girls. Ellie Goulding The results of the sentiment analysis of Ellie Goulding are: 43 percent of classified words was posi- tive, and 56 percent of the words was negative. The main topics of Ellie Goulding are: burn, love, time, baby, heart, fire, lights, life, anything. The resulting moodboard is displayed in image 2. Image 2: Resulting moodboard for Ellie Goulding. Appendix I
  • 6. Slipknot The results of the sentiment analysis of Slipknot are: 23 percent of classified words was positive, and 76 percent of the words was negative. The main topics of Slipknot are: inside, build, fuck, life, everything, end, goodbye, man, eyes. The resulting moodboard is displayed in image 3. Image 3: Resulting moodboard for Slipknot. ABBA The results of the sentiment analysis of ABBA are: 53 percent of classified words was positive, and 46 percent of the words was negative. The main topics of ABBA are: man, mother, honey, waterloo, midnight, take, elaine, nothing. The resulting moodboard is displayed in image 4. Image 4: Resulting moodboard for ABBA.