SlideShare a Scribd company logo
Sentiment analysis of song
lyrics (Sep 2014)
Deepanjan Kundu (120050009)
Siddhartha Dutta (120040005)
Pratyaksh Sharma (120050019)
Prateesh Goyal (120050013)
1
Input/Output
Input: song lyrics
Output: Sentiment exhibited by the song
Refined to 2 labels:
[(Happy, Romantic):positive
(Sad,Angry):negative]
2
Example of output
Need to grow older with a girl like you
Finally see you are naturally
The one to make it so easy
When you show me the truth
Yeah, I’d rather be with you
Say you want the same thing too
TAGS: (ROMANCE)positive
3
DATA SET
1. Used a Web Crawler to collect Data from a
few listed Websites and used them as our
data set. Some of the sites were:
a. www.azlyrics.com
b. www.lyrics.com
c. www.metrolyrics.com
2. The data was already tagged.
4
DATA SET(Contd.)
We created data set for five emotions. The
training set consists of about a little less than
1500 songs tagged with their emotions.
5
Basic Statistics
1. Number of documents in different tags:
2007
a. Positive-975
b. Negative-1032
2. Average length of documents:
Words:253.23 Characters:1007.33
6
Basic Statistics(Contd.)
Frequency distribution(top 50 frequency):
[('I', 19723), ('you', 16177), ('the', 15996), ('to', 10575), ('a', 8214), ('me', 7787),
('and', 6526), ('my', 6330), ('in', 5764), ('And', 5597), ('your', 5162), ('of', 4948),
('it', 4820), ("I'm", 4745), ('that', 3933), ('be', 3795), ('love', 3652), ('is', 3585),
('on', 3480), ('all', 3207), ('You', 3118), ('for', 2936), ('know', 2789), ("don't",
2772), ('this', 2336), ('with', 2322), ('like', 2217), ('just', 2093), ('we', 2047),
('But', 2042), ('so', 2032), ('up', 1934), ('what', 1916), ('can', 1910), ('do', 1857),
("it's", 1766), ('not', 1722), ('The', 1689), ('no', 1636), ('will', 1619), ("can't",
1551), ("I'll", 1534), ('never', 1533), ("you're", 1509), ('have', 1502), ('get', 1501),
('was', 1498), ('are', 1496), ('out', 1486), ('want', 1471)]
7
Dispersion plot of some
words
8
Stemming
Stemmer script was run on the labelled corpus
to extract the root words.
Used python-stemmer
This now forms the new corpus.
9
10
Use of keywords
1. A set of keywords for each label was made:-
words that are more likely to affect the
song’s label.
2. They had been added manually in the python
script.
3. Their numbers are less but can be expanded
easily by searching for same on the Web.
11
12
Rhyme Scheme
● Added a function to Python script to
generate rhyme scheme of stanzas in a song’s
lyrics
● Ran through all the songs in a given folder
● Based on the generated rhyme scheme, we
give a score to the RHYME attribute, which
essentially tells the Degree of rhyming in
that song. 13
Rhyme Scheme
We observe that certain classes (like romantic
and sad songs) tend to have high value for the
RHYME attribute
This attribute will be used for classification
14
15
tf-idf value of a word
1. Term frequency-inverse document frequency
reflects how important is a word to a
document in a corpus
2. tf-idf value increases proportionally to
number of times a word appears in a
document and inversely to number of times
it appears in other.
3. Applied using NLTK. 16
Using POS tags as features
We assume that that different genres of songs
will also differ in the different categories (POS)
of words they use.
We count the number of words (normalized)
for each POS tag category (45 such categories in
Penn treebank).
17
Using POS tags as features
Steps:
1) Remove punctuation, expanded contractions
(I’m -> I am).
2) Tokenize
3) Do POS tagging
4) Count frequency of each pos tag / number of
total words
18
19
Shifting to SVM
Applied linear SVM in scikit after tf-idf
vectorizing
The features used include: 1) Category
keywords, 2) rhyme scheme, 3) POS tagging,
20
Training and Test Data Set
● Used 20 percent of Data for Testing and 80
percent for training.
● The data was uniformly selected as 1 in every
5 for training.
● If you lower the number of samples in the
training , the samples for the model being
built will have too few samples.
21
Contd.
● One of the shortcomings that I have always
found in these techniques is that one of the
assumptions is that by random sampling you
will achieve independence and also a
smooth generation of samples without any
bias of the dataset.
22
Validation
Used 5-fold and 1- fold cross Validation using
NLTK.
23
Results
For test :
accuracy:0.721393034826
positive precision :0.734 negative precision
:0.711
positive recall :0.666 negative recall :0.772
positive F-score :0.698 negative F-score
:0.740 24
Results(Contd.)
5-fold cross validation
Method
Accuracy
1) Tf-idf alone
71.99%
2) Tf-idf + rhyme
72.5%
25
Results (Contd.)
1-fold Cross Validation Accuracy (Self
Validation) =97.66%
Average processing time per document:
(Only SVM):0.001s
(All features):0.182s
26
CONCLUSIONS
27
Lyrics different from just
sentences
● Song may contain series of negative
sentences but end on positive/uplifting note
● Mood/meaning of song not clear just by
considering sentences independent
● Love song may express how happy the singer
was in a relationship but sadness of breakup
expressed in the end
28
Lyrics different from just
sentences
● Lyrics can be VERY ABSTRACT!
What’s the matter with the clothes I’m wearing?
Can’t you tell that your tie’s too wide?
Maybe I should buy some old tab collars?
Welcome back to the age of jive.
● Hard to figure out that this stanza expresses
positive emotion
29
Lyrics different from just
sentences
● Song may express positive emotion about
negative things
● Eg. rap songs frequently express positive
emotion about murder, shooting, drugs,
guns
Whole new level of confusion!
30
Problems
Text inaccuracies: spelling errors
Use of slangs
Metaphors, sarcasm
Cannot capture features(pace, beat, melody etc)
of the song just from lyrics
These features important - no solution 31
Problems
The way of singing/music affects mood/genre
of the song
32
References
http://users.cis.fiu.edu/~lli003/Music/cla/34.
pdf
http://www.cs.berkeley.edu/~schasins/papers
/identifyingEmotionalPolarity.pdf
http://www.joics.com/publishedpapers/2012_
9_1_35_44.pdf
http://stephaniehiga.com/posts/analyzing-
33

More Related Content

Similar to Sentiment analysis of song lyrics

Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
mlaij
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Association for Computational Linguistics
 
Generation of Descriptive Elements for Text
Generation of Descriptive Elements for TextGeneration of Descriptive Elements for Text
Generation of Descriptive Elements for Text
長岡技術科学大学 自然言語処理研究室
 
MUSIC’s VULGAR AWARENESS
MUSIC’s VULGAR AWARENESSMUSIC’s VULGAR AWARENESS
MUSIC’s VULGAR AWARENESS
Boat Teelekboat
 
Allegograph
AllegographAllegograph
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
ijfcstjournal
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
MOINDALVS
 

Similar to Sentiment analysis of song lyrics (9)

Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Generation of Descriptive Elements for Text
Generation of Descriptive Elements for TextGeneration of Descriptive Elements for Text
Generation of Descriptive Elements for Text
 
MUSIC’s VULGAR AWARENESS
MUSIC’s VULGAR AWARENESSMUSIC’s VULGAR AWARENESS
MUSIC’s VULGAR AWARENESS
 
Allegograph
AllegographAllegograph
Allegograph
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
 

Recently uploaded

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 

Recently uploaded (20)

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 

Sentiment analysis of song lyrics

  • 1. Sentiment analysis of song lyrics (Sep 2014) Deepanjan Kundu (120050009) Siddhartha Dutta (120040005) Pratyaksh Sharma (120050019) Prateesh Goyal (120050013) 1
  • 2. Input/Output Input: song lyrics Output: Sentiment exhibited by the song Refined to 2 labels: [(Happy, Romantic):positive (Sad,Angry):negative] 2
  • 3. Example of output Need to grow older with a girl like you Finally see you are naturally The one to make it so easy When you show me the truth Yeah, I’d rather be with you Say you want the same thing too TAGS: (ROMANCE)positive 3
  • 4. DATA SET 1. Used a Web Crawler to collect Data from a few listed Websites and used them as our data set. Some of the sites were: a. www.azlyrics.com b. www.lyrics.com c. www.metrolyrics.com 2. The data was already tagged. 4
  • 5. DATA SET(Contd.) We created data set for five emotions. The training set consists of about a little less than 1500 songs tagged with their emotions. 5
  • 6. Basic Statistics 1. Number of documents in different tags: 2007 a. Positive-975 b. Negative-1032 2. Average length of documents: Words:253.23 Characters:1007.33 6
  • 7. Basic Statistics(Contd.) Frequency distribution(top 50 frequency): [('I', 19723), ('you', 16177), ('the', 15996), ('to', 10575), ('a', 8214), ('me', 7787), ('and', 6526), ('my', 6330), ('in', 5764), ('And', 5597), ('your', 5162), ('of', 4948), ('it', 4820), ("I'm", 4745), ('that', 3933), ('be', 3795), ('love', 3652), ('is', 3585), ('on', 3480), ('all', 3207), ('You', 3118), ('for', 2936), ('know', 2789), ("don't", 2772), ('this', 2336), ('with', 2322), ('like', 2217), ('just', 2093), ('we', 2047), ('But', 2042), ('so', 2032), ('up', 1934), ('what', 1916), ('can', 1910), ('do', 1857), ("it's", 1766), ('not', 1722), ('The', 1689), ('no', 1636), ('will', 1619), ("can't", 1551), ("I'll", 1534), ('never', 1533), ("you're", 1509), ('have', 1502), ('get', 1501), ('was', 1498), ('are', 1496), ('out', 1486), ('want', 1471)] 7
  • 8. Dispersion plot of some words 8
  • 9. Stemming Stemmer script was run on the labelled corpus to extract the root words. Used python-stemmer This now forms the new corpus. 9
  • 10. 10
  • 11. Use of keywords 1. A set of keywords for each label was made:- words that are more likely to affect the song’s label. 2. They had been added manually in the python script. 3. Their numbers are less but can be expanded easily by searching for same on the Web. 11
  • 12. 12
  • 13. Rhyme Scheme ● Added a function to Python script to generate rhyme scheme of stanzas in a song’s lyrics ● Ran through all the songs in a given folder ● Based on the generated rhyme scheme, we give a score to the RHYME attribute, which essentially tells the Degree of rhyming in that song. 13
  • 14. Rhyme Scheme We observe that certain classes (like romantic and sad songs) tend to have high value for the RHYME attribute This attribute will be used for classification 14
  • 15. 15
  • 16. tf-idf value of a word 1. Term frequency-inverse document frequency reflects how important is a word to a document in a corpus 2. tf-idf value increases proportionally to number of times a word appears in a document and inversely to number of times it appears in other. 3. Applied using NLTK. 16
  • 17. Using POS tags as features We assume that that different genres of songs will also differ in the different categories (POS) of words they use. We count the number of words (normalized) for each POS tag category (45 such categories in Penn treebank). 17
  • 18. Using POS tags as features Steps: 1) Remove punctuation, expanded contractions (I’m -> I am). 2) Tokenize 3) Do POS tagging 4) Count frequency of each pos tag / number of total words 18
  • 19. 19
  • 20. Shifting to SVM Applied linear SVM in scikit after tf-idf vectorizing The features used include: 1) Category keywords, 2) rhyme scheme, 3) POS tagging, 20
  • 21. Training and Test Data Set ● Used 20 percent of Data for Testing and 80 percent for training. ● The data was uniformly selected as 1 in every 5 for training. ● If you lower the number of samples in the training , the samples for the model being built will have too few samples. 21
  • 22. Contd. ● One of the shortcomings that I have always found in these techniques is that one of the assumptions is that by random sampling you will achieve independence and also a smooth generation of samples without any bias of the dataset. 22
  • 23. Validation Used 5-fold and 1- fold cross Validation using NLTK. 23
  • 24. Results For test : accuracy:0.721393034826 positive precision :0.734 negative precision :0.711 positive recall :0.666 negative recall :0.772 positive F-score :0.698 negative F-score :0.740 24
  • 25. Results(Contd.) 5-fold cross validation Method Accuracy 1) Tf-idf alone 71.99% 2) Tf-idf + rhyme 72.5% 25
  • 26. Results (Contd.) 1-fold Cross Validation Accuracy (Self Validation) =97.66% Average processing time per document: (Only SVM):0.001s (All features):0.182s 26
  • 28. Lyrics different from just sentences ● Song may contain series of negative sentences but end on positive/uplifting note ● Mood/meaning of song not clear just by considering sentences independent ● Love song may express how happy the singer was in a relationship but sadness of breakup expressed in the end 28
  • 29. Lyrics different from just sentences ● Lyrics can be VERY ABSTRACT! What’s the matter with the clothes I’m wearing? Can’t you tell that your tie’s too wide? Maybe I should buy some old tab collars? Welcome back to the age of jive. ● Hard to figure out that this stanza expresses positive emotion 29
  • 30. Lyrics different from just sentences ● Song may express positive emotion about negative things ● Eg. rap songs frequently express positive emotion about murder, shooting, drugs, guns Whole new level of confusion! 30
  • 31. Problems Text inaccuracies: spelling errors Use of slangs Metaphors, sarcasm Cannot capture features(pace, beat, melody etc) of the song just from lyrics These features important - no solution 31
  • 32. Problems The way of singing/music affects mood/genre of the song 32