Sentiment analysis of song lyrics

Sentiment analysis of song
lyrics (Sep 2014)
Deepanjan Kundu (120050009)
Siddhartha Dutta (120040005)
Pratyaksh Sharma (120050019)
Prateesh Goyal (120050013)
1

Input/Output
Input: song lyrics
Output: Sentiment exhibited by the song
Refined to 2 labels:
[(Happy, Romantic):positive
(Sad,Angry):negative]
2

Example of output
Need to grow older with a girl like you
Finally see you are naturally
The one to make it so easy
When you show me the truth
Yeah, I’d rather be with you
Say you want the same thing too
TAGS: (ROMANCE)positive
3

DATA SET
1. Used a Web Crawler to collect Data from a
few listed Websites and used them as our
data set. Some of the sites were:
a. www.azlyrics.com
b. www.lyrics.com
c. www.metrolyrics.com
2. The data was already tagged.
4

DATA SET(Contd.)
We created data set for five emotions. The
training set consists of about a little less than
1500 songs tagged with their emotions.
5

Basic Statistics
1. Number of documents in different tags:
2007
a. Positive-975
b. Negative-1032
2. Average length of documents:
Words:253.23 Characters:1007.33
6

Basic Statistics(Contd.)
Frequency distribution(top 50 frequency):
[('I', 19723), ('you', 16177), ('the', 15996), ('to', 10575), ('a', 8214), ('me', 7787),
('and', 6526), ('my', 6330), ('in', 5764), ('And', 5597), ('your', 5162), ('of', 4948),
('it', 4820), ("I'm", 4745), ('that', 3933), ('be', 3795), ('love', 3652), ('is', 3585),
('on', 3480), ('all', 3207), ('You', 3118), ('for', 2936), ('know', 2789), ("don't",
2772), ('this', 2336), ('with', 2322), ('like', 2217), ('just', 2093), ('we', 2047),
('But', 2042), ('so', 2032), ('up', 1934), ('what', 1916), ('can', 1910), ('do', 1857),
("it's", 1766), ('not', 1722), ('The', 1689), ('no', 1636), ('will', 1619), ("can't",
1551), ("I'll", 1534), ('never', 1533), ("you're", 1509), ('have', 1502), ('get', 1501),
('was', 1498), ('are', 1496), ('out', 1486), ('want', 1471)]
7

Dispersion plot of some
words
8

Stemming
Stemmer script was run on the labelled corpus
to extract the root words.
Used python-stemmer
This now forms the new corpus.
9

Use of keywords
1. A set of keywords for each label was made:-
words that are more likely to affect the
song’s label.
2. They had been added manually in the python
script.
3. Their numbers are less but can be expanded
easily by searching for same on the Web.
11

Rhyme Scheme
● Added a function to Python script to
generate rhyme scheme of stanzas in a song’s
lyrics
● Ran through all the songs in a given folder
● Based on the generated rhyme scheme, we
give a score to the RHYME attribute, which
essentially tells the Degree of rhyming in
that song. 13

Rhyme Scheme
We observe that certain classes (like romantic
and sad songs) tend to have high value for the
RHYME attribute
This attribute will be used for classification
14

tf-idf value of a word
1. Term frequency-inverse document frequency
reflects how important is a word to a
document in a corpus
2. tf-idf value increases proportionally to
number of times a word appears in a
document and inversely to number of times
it appears in other.
3. Applied using NLTK. 16

Using POS tags as features
We assume that that different genres of songs
will also differ in the different categories (POS)
of words they use.
We count the number of words (normalized)
for each POS tag category (45 such categories in
Penn treebank).
17

Using POS tags as features
Steps:
1) Remove punctuation, expanded contractions
(I’m -> I am).
2) Tokenize
3) Do POS tagging
4) Count frequency of each pos tag / number of
total words
18

Shifting to SVM
Applied linear SVM in scikit after tf-idf
vectorizing
The features used include: 1) Category
keywords, 2) rhyme scheme, 3) POS tagging,
20

Training and Test Data Set
● Used 20 percent of Data for Testing and 80
percent for training.
● The data was uniformly selected as 1 in every
5 for training.
● If you lower the number of samples in the
training , the samples for the model being
built will have too few samples.
21

Contd.
● One of the shortcomings that I have always
found in these techniques is that one of the
assumptions is that by random sampling you
will achieve independence and also a
smooth generation of samples without any
bias of the dataset.
22

Validation
Used 5-fold and 1- fold cross Validation using
NLTK.
23

Results
For test :
accuracy:0.721393034826
positive precision :0.734 negative precision
:0.711
positive recall :0.666 negative recall :0.772
positive F-score :0.698 negative F-score
:0.740 24

Results(Contd.)
5-fold cross validation
Method
Accuracy
1) Tf-idf alone
71.99%
2) Tf-idf + rhyme
72.5%
25

Results (Contd.)
1-fold Cross Validation Accuracy (Self
Validation) =97.66%
Average processing time per document:
(Only SVM):0.001s
(All features):0.182s
26

Lyrics different from just
sentences
● Song may contain series of negative
sentences but end on positive/uplifting note
● Mood/meaning of song not clear just by
considering sentences independent
● Love song may express how happy the singer
was in a relationship but sadness of breakup
expressed in the end
28

sentences
● Lyrics can be VERY ABSTRACT!
What’s the matter with the clothes I’m wearing?
Can’t you tell that your tie’s too wide?
Maybe I should buy some old tab collars?
Welcome back to the age of jive.
● Hard to figure out that this stanza expresses
positive emotion
29

sentences
● Song may express positive emotion about
negative things
● Eg. rap songs frequently express positive
emotion about murder, shooting, drugs,
guns
Whole new level of confusion!
30

Problems
Text inaccuracies: spelling errors
Use of slangs
Metaphors, sarcasm
Cannot capture features(pace, beat, melody etc)
of the song just from lyrics
These features important - no solution 31

Problems
The way of singing/music affects mood/genre
of the song
32

References
http://users.cis.fiu.edu/~lli003/Music/cla/34.
pdf
http://www.cs.berkeley.edu/~schasins/papers
/identifyingEmotionalPolarity.pdf
http://www.joics.com/publishedpapers/2012_
9_1_35_44.pdf
http://stephaniehiga.com/posts/analyzing-
33

Sentiment analysis of song lyrics

Recommended

Recommended

More Related Content

Similar to Sentiment analysis of song lyrics

Similar to Sentiment analysis of song lyrics (9)

Recently uploaded

Recently uploaded (20)

Sentiment analysis of song lyrics