1 Large-‐Scale Machine Listening And Automa6c Mood Labeling for Music Discovery in Consumer Applica6ons Feel The Music: Sound and Emo5on Peter DiMaria Gracenote, Inc. Ching-‐Wei Chen Gracenote, Inc.
2 Feel The Music: Sound and Emo5on Overview • Gracenote Background • The Challenges of Music Discovery • Sonic Mood Overview – Sonic Mood Taxonomy – Cura6on of Machine Listening Training Set – Ground Truth Annota6on – Classiﬁer Model Training • MoodGrid User Interface • Example Implementa6ons • Demo
3 Feel The Music: Sound and Emo5on Powering the Music Experience Music Iden6ﬁca6on Media Management Discovery More Like This™ Links Music Channels Lyrics Ar6st Imagery Cover Art Gracenote provides music metadata and technology to leading music services, app developers, and consumer electronics & auto manufacturers • Gracenote MusicID® • Scan & Match • Cover Art & Ar5st Images • Gracenote Discover™ Apple Amazon Pandora BMW LG Sony Panasonic Ford GM Mission: To create beJer ways to discover and enjoy digital entertainment
4 The Challenges of Music Naviga6on & Discovery
5 Feel The Music: Sound and Emo5on The Challenges of Music Naviga6on & Discovery Today’s listener has the world of music at their ﬁnger6ps. • iTunes Store: 28M songs • Spo5fy: 18M • Amazon MP3 Store: 20M • Rhapsody: 11M • YouTube: Lots • Pandora: 1M
6 Feel The Music: Sound and Emo5on The Challenges of Music Naviga6on & Discovery Given all these op6ons, ﬁnding music is a challenge. • These methods s5ll prevail: – Search – Browsing by Ar5st, Album, Song – Seed-‐based Recommenda5ons
7 Feel The Music: Sound and Emo5on The Challenges of Music Naviga6on & Discovery Searching assumes the listener knows what they are looking for. • Results in a small number of items to listen to. • No discovery.
8 Feel The Music: Sound and Emo5on Browsing by Ar6st, Album, Song • Somewhat useful for organizing personal collec5ons • Doesnt scale to massive online catalogs
9 Feel The Music: Sound and Emo5on Seed-‐based Recommenda6ons • Create “radio sta5ons” based on a “seed” Ar5st or Track. • Endless playback with good variety and discovery. • S5ll requires listener to know what seed Ar5st/Track to use.
11 Feel The Music: Sound and Emo5on Help is on the way – Mood for Music Discovery & Naviga6on Gracenote has analyzed over 30 million unique recordings and generated a sonic mood proﬁle for each • This data can be delivered to client services, devices and vehicles worldwide to power consumer digital music services • Sonic Mood can be used either “behind-‐the-‐scenes” to make internet radio, playlists & recommenda5ons smarter • Or, as a way to help user’s ﬁnd and navigate to music in an intui5ve manner
12 Feel The Music: Sound and Emo5on Our Goal Make it easy for consumers to get one-‐touch access to a focused mood-‐based listening experience. • Oﬀer access in a way that parallels listeners’ own language for describing the music experience they want -‐ “Roman5c”, “Sen5mental”, “Thrilling”, “Energizing” etc. • Poten5ally amplify, maintain or change the user’s current mood state, inducing an appropriate personal or shared experience.
13 Feel The Music: Sound and Emo5on Gracenote Sonic Moods – For Naviga6on & Discovery Solu6on Overview • Scalable and Global Recording-‐Level Mood Descriptors • Combines Gracenote unique capabili5es of: – Interna5onal Team of Expert Musicologists – Advanced Classiﬁer Models – Massive Network of End-‐User Client Apps to Gather DSP Features Process • Machine Listening – Sonic Mood Taxonomy > 10K Songs > Expert Annota5on – Audio Features > Model Training > Classiﬁca5on of 30M Songs – Output: Rich 101-‐Dimension “Sonic Mood Style” Proﬁle – Each Song receives a score for each of 101 mood dimensions • Correlates then enable system to understand the rela5onships between diﬀerent moods
15 Feel The Music: Sound and Emo5on Context for Crea6ng The Sonic Mood Taxonomy We chose to create a new taxonomy that was informed by exis6ng models, yet more speciﬁcally targeted towards our use case of consumer recorded music naviga6on and discovery. These use case requirements included: • Suﬃciently granular taxonomy to enable focused playlists, recommenda5ons and radio • Correla5on with the colloquial meaning of “mood” in the context of consumer music selec5on • Capture those aspects of musical mood expression which are par5cularly important for how music mood is perceived by listeners Sensa6on Emo6on Feeling Mood “Atmosphere” Temperament
16 Feel The Music: Sound and Emo5on Crea6on of a Single Taxonomy Covering All Musical Mood Expression is a Challenge There is an incredible diversity how mood is expressed musically if one examines the complete body of recorded music – across all genres, global origins, and 6me periods through history. • The sonic vocabulary of, for example, western classical orchestral music versus and industrial metal band are radically diﬀerent. • Although each may express “exci5ng”, “brooding”, “serious”, or “drama5c” moods – these expressions are quite diﬀerent in both their acous5c signal and how they are perceived • We have structured our taxonomy to treat these in a separate, yet related, manner so that listeners can get to exactly what they want 1800s1700s1600s2000s2010s
17 Feel The Music: Sound and Emo5on Crea6on of a Single Taxonomy Covering All Musical Mood Expression is a Challenge Gracenote “Sonic Moods” are typically compound terms combining mood, feeling and atmosphere -‐ some6me with addi6onal cultural associa6ons. • An even more accurate term for these would be “Sonic Mood Styles” • This approach provides addi5onal diﬀeren5a5on beyond that of pure emo5on terms. • We have not shied away from use of culturally-‐speciﬁc or colloquial terms are part of our taxonomy – e.g. “Cool”, “Creepy”, “Cosmic”, “Groovy”
18 Feel The Music: Sound and Emo5on Gracenote Sonic Mood Taxonomy • Level 1 contains 26 single-‐word terms. • Level 2 contains 101 mul6-‐word terms • Each Level 1 term contains exactly four Level 2 terms; ra6onalized this way just for ease of use. Peaceful Easygoing Upbeat Lively ExcitedTender Romantic Empowering Stirring RowdySentimental Sophisticated Sensual Fiery EnergizingMelancholy Cool Yearning Urgent DefiantSomber Gritty Serious Brooding AggressivePastoral / Serene Delicate / Tranquil Hopeful / Breezy Cheerful / Playful Carefree Pop Party / Fun Showy / Rousing Lusty / Jaunty Loud Celebratory Euphoric EnergyReverent / Healing Quiet / Introspective Friendly Charming / Easygoing Soulful / Easygoing Happy / Soulful Playful / Swingin Exuberant / Festive Upbeat Pop Groove Happy ExcitementRefined / Mannered Awakening / Stately Sweet / Sincere Heartfelt Passion Strong / Stable Powerful / Heroic Invigorating / Joyous Jubilant / Soulful Ramshackle / Rollicking Wild / RowdyRomantic / Lyrical Light Groovy Dramatic / Romantic Lush / Romantic Dramatic Emotion Idealistic / Stirring Focused Sparkling Triumphant / Rousing Confident / Tough Driving Dark GrooveTender / Sincere Gentle Bittersweet Suave / Sultry Dark Playful Soft Soulful Sensual Groove Dark Sparkling Lyrical Fiery Groove Arousing Groove Heavy BeatLyrical Sentimental Cool Melancholy Intimate Bittersweet Smoky / Romantic Dreamy Pulse Intimate Passionate Rhythm Energetic Abstract Groove Edgy / Sexy Abstract BeatMysterious / Dreamy Light Melancholy Casual Groove Wary / Defiant Bittersweet Pop Energetic Yearning Dark Pop Dark Pop Intensity Heavy Brooding Hard Positive ExcitementWistful / Forlorn Sad / Soulful Cool Confidence Dark Groovy Sensitive / Exploring Energetic Dreamy Dark Urgent Energetic Anxious Attitude / Defiant Hard Dark ExcitementSolemn / Spiritual Enigmatic / Mysterious Sober / Determined Strumming Yearning Melodramatic Hypnotic Rhythm Evocative / Intriguing Energetic Melancholy Dark Hard Beat Heavy TriumphantDark Cosmic Creepy / Ominous Depressed / Lonely Gritty / Soulful Serious / Cerebral Thrilling Dreamy Brooding Alienated / Brooding Chaotic / Intense Aggressive PowerCalm Posi6ve Energe6c Dark
19 Feel The Music: Sound and Emo5on Interna6onaliza6on The sonic mood classes must be labeled in a way that is understandable and resonates with local users around the globe. • A rote transla5on of the mood term from our source language may not be suﬃcient. • Instead our local music editors actually listen to a representa5ve sample of recordings that belong to each class to ensure that they directly perceive the speciﬁc common musical quali5es in these songs • They are then free to express the mood label in colloquial terms that will best resonate with the local popula5on
20 Feel The Music: Sound and Emo5on Mood Similarity & Dissimilarity Each sonic mood is related to each other one via a posi6ve or nega6ve correla6on value • With such a granular taxonomy, this element is essen5al for enabling playlis5ng, recommenda5on, radio and taste proﬁling applica5ons. • For example, this allows us to associate and play music which has a very similar, yet not iden5cal mood to that of a seed song in a radio applica5on. • Without this capability, we would be limited to only presen5ng music which had an iden5cal mood to the seed.
21 Feel The Music: Sound and Emo5on Sonic Mood and Genre Some sonic moods are expressed more frequently in the music of some genres more than others • From a prac5cal perspec5ve, we cannot completely disassociate sonic mood from music genre • Presence or absence of vocals and percussion also have great impact on perceived sonic mood New Age Metal Pastoral / Serene Delicate / Tranquil Quiet / Introspec6ve Reverent / Healing Mysterious / Dreamy Hopeful Breezy Dark Cosmic Drama6c / Roman6c Aggressive Power Hard Dark Excitement Chao6c / Intense Heavy Brooding Heavy Triumphant Conﬁdent / Tough Loud Celebratory Wild / Rowdy
22 Feel The Music: Sound and Emo5on Sonic vs. Lyrical Mood There are many emo6ons and moods that are fundamental to human experience, yet are not ar6culately expressed via audio alone – lyrical content or other context is required. • Our current system does not incorporate any understanding of lyrical content as it can neither be directly perceived or extracted from the acous5c source alone. • To the extent that the vocaliza5ons present in the acous5c signal are in alignment with the mood of the lyric, there will be correla5on, but only as a result of the acous5c signal
23 Feel The Music: Sound and Emo5on Instrumental vs. Vocal Our taxonomy, training set and classiﬁer have an equal or greater emphasis on instrumental expressions of mood rather than just which is expressed via vocals • So, although the direct expression of emo5on in the vocals, and other quali5es of vocals (5mbre, range, gender) are elements which contributes to the classiﬁca5on, the are not necessarily the primary element. • The system has to be suﬃciently robust to handle vocal and instrumental music equally well.
25 Feel The Music: Sound and Emo5on Training Library of Music for Machine Listening Our produc6on system has been trained based on a hand-‐selected body of 10,000 recordings. • The objec5ve is for this set to include a suﬃcient representa5ve examples of all sonic moods. • The training set includes music form all genres, era and geographic origins • Recording that are judged to be par5cularly pure expressions of certain sonic moods are given preference.
27 Feel The Music: Sound and Emo5on Mood Annota6on of Training Library Annota6on of each training set recording with one of over 300 sonic mood classes is based on the overall impression of the recording • If there is signiﬁcant range of sonic moods within the song, only an excerpt that is representa5ve of a single mood is selected for training • Annota5on is performed by musicologists employed by Gracenote who are opera5ng under a common set of deﬁni5ons for each mood – maintaining editorial consistency Cool Melancholy In6mate Bieersweet Enigma6c / Mysterious Energe6c Anxious
29 Feel The Music: Sound and Emo5on Scaling to Millions of Tracks • Human annota5on is not scalable to the millions of tracks in online catalogs. • This is where Machine Learning comes in handy.
30 Feel The Music: Sound and Emo5on Supervised Machine Learning From hp://nltk.googlecode.com/
31 Feel The Music: Sound and Emo5on Audio Features • A technique for represen5ng audio in a perceptually and musically meaningful way. ASEFrameFrequencyBand50 100 150 200 2502468101214
32 Feel The Music: Sound and Emo5on Training the Classiﬁer • Using all the audio features from a training set of songs with a par5cular Mood label, the classiﬁer creates a probabilis5c model which describes that Mood in terms of the distribu5on of underlying features. Training Features Trained Model Audio Labeled “Somber”
33 Feel The Music: Sound and Emo5on Classiﬁca6on Model for “Somber” Features “Somber”: 30% Unlabeled Audio • Features from unlabeled audio are compared to each model, and the classiﬁer es5mates the probability that they belong to that Mood. • The Mood with the highest probability is designated the Primary Mood.
34 Feel The Music: Sound and Emo5on New Favorite Alison Krauss + Union Sta6on Light Melancholy
35 Feel The Music: Sound and Emo5on She Is Beau>ful Andrew W.K. Hard Posi6ve Excitement
36 MoodGrid for User Interface, Naviga6on & Selec6on
37 Feel The Music: Sound and Emo5on Categories & Valence–Arousal Space UI – A Hybrid Approach We use the valence – arousal space as an organizing paradigm. The discrete categories are then mapped into a best ﬁt loca6on in the valence arousal space to enable precise choice. • This provides the best of both worlds: users know where to generally look to ﬁnd celebratory vs. solemn vs. aggressive vs. peaceful music, but are then able to zero in on a very speciﬁc musical mood of their choice via the seman5c labels. • Note: Our representa5on of the valence arousal space is rotated 90 degrees from most. Posi6ve Energe6c Calm Dark
39 Feel The Music: Sound and Emo5on Mercedes-‐Benz F-‐125 Concept Car with Gracenote MoodGrid Mercedes is using Gracenote mood technology to showcase advanced music naviga6on • MoodGrid localized into German • Naviga5on in MoodGrid done via gesture control • Streaming radio sta5on begins to play based on the mood selected
40 Feel The Music: Sound and Emo5on Visteon Prototype with Gracenote MoodGrid Automo6ve systems vendor Visteon used Gracenote mood technology to power this prototype music naviga6on HMI • Alterna5ve 3 x 5 valence-‐arousal mood category space u5lized • Mood selec5on via wheel controller
41 Feel The Music: Sound and Emo5on Sony Music Licensing Sony Music uses Gracenote mood data to make it easy for music directors to select just the right music for their next soundtrack project • Mood is an essen5al aJribute for selec5ng the appropriate music for ﬁlm, television and video produc5ons • Gracenote provide a primary mood descriptor for each recording that is available to license • The 2-‐level mood hierarchy is used to allow simple or detailed search
42 Feel The Music: Sound and Emo5on HABU for iPhone & iPad iOS app for mood naviga6on that projects the valence arousal space into a circular format -‐ from Gracenote partner company Gravity • The more music the user has of a given mood, the larger the circle appears in the “mood map” • Selec5on of a mood starts playback of songs from the user’s collec5on of that mood
43 Feel The Music: Sound and Emo5on Coachella Mood Maps Gravity also recently created sonic mood infographics for Coachella fes6val based on the HABU UI • For each day of the fes5val, 2 high-‐level moods were chosen to be featured • For each mood, matching songs by four diﬀerent ar5sts performing that day are displayed • An overall mood map for songs performed during the day is also shown