SlideShare a Scribd company logo
Generating Ground Truth for
Music Mood Classification
Using Mechanical Turk
Jin Ha Lee & Xiao Hu
JCDL 2012
Mood: a relatively long lasting
and stable emotional state (Meyer, 1956)
Emotion?
Affect?
Music mood
• Recently received a lot of attention in
MIR (Music Information Retrieval) domain
• “Audio Music Mood Classification” task in
MIREX, starting in 2007
• Critical for developing MDL
Music Information Retrieval
Evaluation eXchange
• Evaluation is based
on ground truth
Passionate Bittersweet Bittersweet
Bittersweet
More is better!
However, generating ground truth
based on human input is expensive
and time consuming
How is it done in MIREX?
• A web-based survey system called E6K
• Invitations posted to MIREX and music-ir
mailing lists in order to recruit
volunteers
Can we use the
CROWD
instead of
MUSIC
EXPERTS?
Is there a
better way?
1. How do music mood classification results
obtained from MechanicalTurk
compare to those collected from music
experts in MIREX?
2. How different or similar are the
evaluation outcomes for MIREX
AMC task when based on ground truth
collected from MechanicalTurk vs. E6K?
Workers (Turkers)
Task RequesterAmazon
Mechanical
Turk
(MTurk)
Cluster1 passionate, rousing, confident, boisterous, rowdy
Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured
Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding
Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry
Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral
TASK:
Listen to 30 second
music clips →
Select one of the five
mood clusters ↓
Qualification
test
Consistency
check
Review
process
1250 songs
x 2 judgments
2500 unique mood judgments
186 HITs collected
- 86 HITs rejected
100 HITs accepted Basic Stats
1HIT =25 songs
EVALUTRON 6000
Stats on Collecting Data
AverageTime Spent on Each Music Clip
21.54 seconds 17.46 seconds
TotalTime for Collecting All Judgments
38 days
(+ additional in-house
assessment)
19 days
Cost for Collecting All Judgments
$0 $60.50
Comparison of E6K and MTurk data
Cluster E6K MTurk
Diff. in %
(E6K-MTurk)
Cluster1 405 (16.4%) 450 (18.0%) -1.6%
Cluster2 472 (19.1%) 536 (21.4%) -2.3%
Cluster3 542 (22.0%) 622 (24.9%) -2.9%
Cluster4 412 (16.7%) 367 (14.7%) 2.0%
Cluster5 400 (16.2%) 403 (16.1%) 0.1%
Other 237 (9.6%) 122 (4.9%) 4.7%
Total 2468 2500 -
Number of Judgments and
Distribution across Clusters
Distribution of Agreement
Cluster E6K MTurk Both
Cluster1 121 89 29
Cluster2 130 131 44
Cluster3 163 216 91
Cluster4 121 85 42
Cluster5 126 121 64
Total 661 642 270
Confusion among the Clusters
Clusters
Disagreed in
E6K
Disagreed IN
MTurk
Cluster 1 & Cluster 2 20 95
Cluster 2 & Cluster 4 31 86
Cluster 1 & Cluster 5 13 74
⁞ ⁞ ⁞
Cluster 3 & Cluster 4 6 27
Cluster 2 & Cluster 5 1 22
Cluster 3 & Cluster 5 1 20
Total 253 595
Cluster
1
Cluster
2
Cluster
5
Cluster
4
Cluster
3
Russell’s model
System Performance
E6K
Average
accuracy
MTurk
Average
accuracy
CL 0.65 GT 0.66
GT 0.64 CL 0.63
TL 0.64 TL 0.63
ME1 0.61 ME1 0.57
ME2 0.61 ME2 0.57
IM2 0.57 IM2 0.57
KL1 0.56 KL1 0.55
IM1 0.53 IM1 0.54
KL2 0.29 KL2 0.29
TK-HSD Rank Comparison
MTurkE6K
Conclusion
• Overall the human judgments from E6K and
MTurk showed similar patterns:
– Judgment distribution across five mood clusters
– Agreement distribution across clusters
– Confusion among clusters
• System performance rankings from E6K and
Mturk were also comparable
Conclusion (Cont’d.)
• However, combined ground truth from E6K
and MTurk is only about 60% the size of the
original E6K ground truth
• Mood is a highly subjective feature for
describing and organizing music
• Other means for judging the moods should be
explored (e.g., ranking)
Future work
• In-depth interview with users to investigate
factors affecting people’s judgments on music
mood
• More controlled study with different user
groups
Questions?

More Related Content

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Featured

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Generating Ground Truth for Music Mood Classification Using Mechanical Turk

  • 1. Generating Ground Truth for Music Mood Classification Using Mechanical Turk Jin Ha Lee & Xiao Hu JCDL 2012
  • 2. Mood: a relatively long lasting and stable emotional state (Meyer, 1956) Emotion? Affect?
  • 3. Music mood • Recently received a lot of attention in MIR (Music Information Retrieval) domain • “Audio Music Mood Classification” task in MIREX, starting in 2007 • Critical for developing MDL Music Information Retrieval Evaluation eXchange
  • 4. • Evaluation is based on ground truth Passionate Bittersweet Bittersweet Bittersweet
  • 5. More is better! However, generating ground truth based on human input is expensive and time consuming
  • 6. How is it done in MIREX? • A web-based survey system called E6K • Invitations posted to MIREX and music-ir mailing lists in order to recruit volunteers
  • 7.
  • 8. Can we use the CROWD instead of MUSIC EXPERTS? Is there a better way?
  • 9. 1. How do music mood classification results obtained from MechanicalTurk compare to those collected from music experts in MIREX? 2. How different or similar are the evaluation outcomes for MIREX AMC task when based on ground truth collected from MechanicalTurk vs. E6K?
  • 11. Cluster1 passionate, rousing, confident, boisterous, rowdy Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral TASK: Listen to 30 second music clips → Select one of the five mood clusters ↓
  • 13. 1250 songs x 2 judgments 2500 unique mood judgments 186 HITs collected - 86 HITs rejected 100 HITs accepted Basic Stats 1HIT =25 songs
  • 14. EVALUTRON 6000 Stats on Collecting Data AverageTime Spent on Each Music Clip 21.54 seconds 17.46 seconds TotalTime for Collecting All Judgments 38 days (+ additional in-house assessment) 19 days Cost for Collecting All Judgments $0 $60.50
  • 15. Comparison of E6K and MTurk data
  • 16. Cluster E6K MTurk Diff. in % (E6K-MTurk) Cluster1 405 (16.4%) 450 (18.0%) -1.6% Cluster2 472 (19.1%) 536 (21.4%) -2.3% Cluster3 542 (22.0%) 622 (24.9%) -2.9% Cluster4 412 (16.7%) 367 (14.7%) 2.0% Cluster5 400 (16.2%) 403 (16.1%) 0.1% Other 237 (9.6%) 122 (4.9%) 4.7% Total 2468 2500 - Number of Judgments and Distribution across Clusters
  • 17. Distribution of Agreement Cluster E6K MTurk Both Cluster1 121 89 29 Cluster2 130 131 44 Cluster3 163 216 91 Cluster4 121 85 42 Cluster5 126 121 64 Total 661 642 270
  • 18. Confusion among the Clusters Clusters Disagreed in E6K Disagreed IN MTurk Cluster 1 & Cluster 2 20 95 Cluster 2 & Cluster 4 31 86 Cluster 1 & Cluster 5 13 74 ⁞ ⁞ ⁞ Cluster 3 & Cluster 4 6 27 Cluster 2 & Cluster 5 1 22 Cluster 3 & Cluster 5 1 20 Total 253 595
  • 20. System Performance E6K Average accuracy MTurk Average accuracy CL 0.65 GT 0.66 GT 0.64 CL 0.63 TL 0.64 TL 0.63 ME1 0.61 ME1 0.57 ME2 0.61 ME2 0.57 IM2 0.57 IM2 0.57 KL1 0.56 KL1 0.55 IM1 0.53 IM1 0.54 KL2 0.29 KL2 0.29
  • 22. Conclusion • Overall the human judgments from E6K and MTurk showed similar patterns: – Judgment distribution across five mood clusters – Agreement distribution across clusters – Confusion among clusters • System performance rankings from E6K and Mturk were also comparable
  • 23. Conclusion (Cont’d.) • However, combined ground truth from E6K and MTurk is only about 60% the size of the original E6K ground truth • Mood is a highly subjective feature for describing and organizing music • Other means for judging the moods should be explored (e.g., ranking)
  • 24. Future work • In-depth interview with users to investigate factors affecting people’s judgments on music mood • More controlled study with different user groups