Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Lyric Classification by
Pronouns
-By Zhiyu Huo and Matthew Luce
Problem Introduction
• Sentiment analysis aims to determine the
attitude of a speaker or a writer with respect
to some top...
Problem Introduction
• We work on four classification topics. They
are:
– Mood:
• Self-Center
• Self-Reflective

– Target ...
Dataset
• 222 songs from different genres and singers
are selected and manually labeled by Luce and
me.
Features
• We define features based on pronouns
– Pronouns frequency (PF)
– Regular expression (RE)
– Contextual (CON)
PF
•
•
•
•
•
•
•
•
•
•

I: {I, me, my, mine}
You: {you, ya, your, yours}
He/She: {he, him, his, she, her, hers}
We: {we, u...
RE
•
•
•
•
•
•
•
•

r”I .* her”: e.g. “I miss her so much.”
r”You .* my”: e.g. “You got my attention.”
r”^I .*”: e.g. “I a...
CON
context = [[1, 22], [25, 25], [25, 22], [23, 22], [22, 23], [23, 23],
[1, 1], [24, 23], [23, 1], [1, 24], [24, 24], [2...
Classifier - Boosting
• Boosting: Using multiple weak classifiers to
build a strong one.
• For the first three features, w...
Experiment & Result
• Tools: Python NLTK & MATLAB
S-C

Boosting

Recall

0.66

0.76

0.57

0.96

0.31

0.40

0.55

0.71

R...
Difficulty and Challenging
• Feature Selection:
– From frequency to context

• Dataset Bias:
– All data is human labelled ...
Conclusion
• We developed and tested three kings of
pronouns feature. In addition, we trained
boosting classifier for a be...
Upcoming SlideShare
Loading in …5
×

Slide

268 views

Published on

  • Be the first to comment

  • Be the first to like this

Slide

  1. 1. Lyric Classification by Pronouns -By Zhiyu Huo and Matthew Luce
  2. 2. Problem Introduction • Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. • In our project, we use pronounce-based feature to classify lyrics
  3. 3. Problem Introduction • We work on four classification topics. They are: – Mood: • Self-Center • Self-Reflective – Target Figure: • Speaking Globally • Speaking about a relationship • We develop binary classifier to see if a song can be classified to a category or not.
  4. 4. Dataset • 222 songs from different genres and singers are selected and manually labeled by Luce and me.
  5. 5. Features • We define features based on pronouns – Pronouns frequency (PF) – Regular expression (RE) – Contextual (CON)
  6. 6. PF • • • • • • • • • • I: {I, me, my, mine} You: {you, ya, your, yours} He/She: {he, him, his, she, her, hers} We: {we, us, our, ours} They: {they, them, their, theirs} First: { I, me, my, mine, we, us, our, ours} Second: { you, ya, your, yours } Third: { he, him, his, she, her, hers, they, them, their, theirs} Singular: { I, me, my, mine, he, him, his, she, her, hers, you, ya, your, yours} Plural: { we, us, our, ours, you, ya, your, yours, they, them, their, theirs } Feature Value 𝑁𝑖 𝑓𝑖 = 𝑁
  7. 7. RE • • • • • • • • r”I .* her”: e.g. “I miss her so much.” r”You .* my”: e.g. “You got my attention.” r”^I .*”: e.g. “I am riding a tank.” r”They .* us”: e.g. “They wanna control us.” r”Let us .*”: e.g. “Let us share the world.” r”You .* me”: e.g. “You raise me up.” r”We .*you”: e.g. “We will rock you for free.” …… The value of each element in the feature vector is 𝑁𝑘 𝑓𝑘 = 𝑁𝑆
  8. 8. CON context = [[1, 22], [25, 25], [25, 22], [23, 22], [22, 23], [23, 23], [1, 1], [24, 23], [23, 1], [1, 24], [24, 24], [22, 1], [23, 10], [10, 22], [22, 22], [7, 24], [26, 23], [23, 11], [23, 24], [23, 25], [25, 23], [24, 7], [16, 16], [5, 24], [22, 25], [26, 24], [24, 22], [11, 1], [22, 24], [24, 11], [13, 23], [7, 23], [1, 23], [27, 22], [11, 24], [23, 13], [23, 26], [26, 22], [26, 26], [13, 13], [13, 22], [1, 15], [15, 18], [27, 11], [11, 22], [22, 26], [27, 1], [24, 27], [27, 23], [7, 25], [24, 25], [25, 24], [24, 1], [1, 26], [22, 11], [11, 23], [23, 27], [27, 27], [27, 24], [22, 10], [23, 16], [10, 11], [16, 23], [25, 12], [12, 22], [26, 12], [12, 25], [16, 1], [12, 24], [1, 25], [22, 27], [15, 25], [25, 1], [24, 26], [25, 13], [13, 24], [24, 5], [1, 16], [16, 24], [25, 10], [10, 23], [16, 26], [15, 24], [24, 13], [1, 11], [26, 13], [11, 11], [24, 16], [26, 1]]
  9. 9. Classifier - Boosting • Boosting: Using multiple weak classifiers to build a strong one. • For the first three features, we train linear SVM on them. Then we build a boosting classifier based on that
  10. 10. Experiment & Result • Tools: Python NLTK & MATLAB S-C Boosting Recall 0.66 0.76 0.57 0.96 0.31 0.40 0.55 0.71 Recall 0.66 0.67 0.61 0.99 0.33 0.31 0.53 0.69 Recall 0.48 0.56 0.40 0.87 Precision CON SP-R Precision RE SP-G Precision PF S-R 0.29 0.50 0.48 0.74 Recall 0.76 0.64 0.59 0.99 Precision 0.36 0.44 0.64 0.81
  11. 11. Difficulty and Challenging • Feature Selection: – From frequency to context • Dataset Bias: – All data is human labelled (by us) – Mood metric
  12. 12. Conclusion • We developed and tested three kings of pronouns feature. In addition, we trained boosting classifier for a better result. • The pronouns feature is more effective on “Target Figure” problem rather than “Mood” Problem • This kind of detector can be used in song searching

×