SlideShare a Scribd company logo
1 of 34
Download to read offline
CONTENT-WORTH DEBUT ARTIST CLASSIFIER
Wonyoung Lucas Seo
INTRODUCTION
HIPHOPLE.com ??????????
- US HIPHOP/R&B based Web Magazine platform
- Major Contents
- Translation
- Lyric Translation( * this is the part I contribute )
- Subtitled MusicVideo
- News of HIPHOP / R&B Industry
- Magazine
- Feature Articles
- Artist Interview
- Album Review
- Fashion / Life Style
Project Inspiration
- CEO’s needs
• Public reputation / preference analysis for
merchandise sales
• K-Hiphop buzz analysis in South-East Asian
region media
• Super Rookie Artist prediction analysis
• Messenger app chatbot
• And many more …
Why don’t I see you
working these days?
I’ve been totally occupied with
this machine learning thing
Do you have anything in mind that you
want to predict or analyze ?
I need some inspiration for my project
Actually, I do !!
This …
And that …
This would be awesome
Yeah and that too …
( CEO )
Predict Super Rookie ?? ( … sounds cool.
Maybe I should first focus on this one)
Let’s predict
Super Rookie!!!
Setting More
Specific Goal …
Hiphople Magazine Team produces contents of
successful debut artists
When producing contents on a debut artist … :
(i) Based on the domain/industrial knowledge , editors
decide whether or not to produce a content on thie
particular rookie artist.
(ii) Spend considerable amount of time to research them.
(iii) Or loose chance to become the first press to write
about the aritst to other press.
How can we reduce research time by
assigning priority list?
&
Improve decision making process?
Let’s predict whether the rookie artist
is worth creating a content & publish /
distribute to our subscribers with Machine
Learning classification technique based on
the quantitative information
H I P H O P L E ’s c o n t e n t s =
R ev i e w + I n t e r v i e w + Fe a t u re
A r t i c l e s + S N S A c c o u n t + @
PROJECT WORKFLOW
1. DefineVariables / Input
2. Collect Data & Store AWS MySQL DB Server
3. Labeling TargetVariable Classes
4. ORM Data Query
5. Merging Data
6. Fitting Models
7. Feature Engineering
8. Model CrossValidation and Optimization
DEFINEVARIABLES / INPUT
Binary Classification
1 : Worth-creating a content with this artist
0 : Ignore this artist
HIPHOPLE editors
participated in labeling process based
on their domain knowledge
TargetVariable
IndependentVariables (Features, Input)
Rows : Debut Album released from Jan 1st 2011 to Apr 18th 2018
- Genre
- Count of single album released before the official album release
- Count of articles published by Music / Culture Media Press
- Ratings from Music Media Press
- Number of SNS followers of each artist
COLLECTING DATA (CRAWLING)
Crawling Data CrawlingTarget Crawling Python Library
Debut album by year www.wikipedia.com Requests, BeautifulSoup
Ratings www.metacritic.com Scrapy
www.pitchfork.com Requests
www.albumoftheyear.com Requests, BeautifulSoup
SNS followers of each artist www.chartmetric.io Selenium
Count of related article www.billboard.com Requests, BeautifulSoup
www.genius.com Requests, BeautifulSoup
www.thesource.com Requests, BeautifulSoup
www.xxlmag.com Requests, BeautifulSoup
Data Crawling à Stored in AWS EC2 MySQL DB
TARGETVARIABLE CLASS LABELING
• Editors participated in labeling
the class as 1 , 0
• Labeling sheet was shared via
Google Spreadsheet
Labeling
• If you had created any contents on this artist/album
please label it as “ 1 ” in roman numerals
• If you don’t remember but still think its worth
creating contents, please label it as “ 1 ”
• If you didn’t created any contents related to this
artist/album, please label it as “ 0 ”
MERGE DATA
Data size : 1083 Rows
- Label ( TargetVariable)
Features : 15
Aritcle Counts
- Billboard
- Genius
- The Source
- XXL Magazine
Album Info
- Genre
- Count of Single
Albums
SNS
- Twitter followers
- Instagram followers
- Facebook page Likes
- Youtube Channel subscribers
- Soundcloud followers
- Spotify followers
Ratings
- Album of theYear
- Pitchfork
- Metacritic
FITTING MODELS
Model
Classification Metrics (Class 1)
Comment
Precision Recall F 1 Score AUC
KNN : K-Nearest Neighbours 0.79 0.43 0.55 0.83
SGD : Steepest Gradient
Decent
0.31 0.92 0.46 0.66
Decision Tree 0.62 0.65 0.63 0.86
Random Forest 0.74 0.67 0.70 0.91 ★
Gradient Boosting 0.69 0.57 0.62 0.93
XGBoost :
Extra Gradient Boosting
0.78 0.64 0.70 0.93
Classification Algorthms: Scikit-Learn Libraries
Reason for selecting Random Forest Classifier
Recall (Sensitivity) was major consideration for this project
- The impact of False Positive ( FP ) in case of considering Precision
• Incorrect prediction : answered Class 1 (prediction) to actual Class 0 data (Real)
• Create content even though the artist & album not valued.
• Spend more time and energy on incorrect class à Wasted internal resources
- The impact of False Negative ( FN ) in case of considering Recall
• Incorrect prediction : answered Class 0 (prediction) to actual Class 1 data (Rea)
• Ignore the artist & album that we shouldn’t have ignored
• Increase possibility of other media creating content first hand
• Increase possibility of losing potential subscribers
• Increase possibility of losing opportunity of arranging management contract with the
artist for management in South Korea
FEATURE ENGINEERING
Feature EngineeringTrials
Feature Type Descriptions
Performance
Improvement
Genre One-Hot Encoding Create DummyVariable with One-Hot Encoding O
Ratings NullValue Imputation
Fill in “ 0 ” rating score for the album that didn’t
received any ratings
O
Buzz
Rating
SNS Followers
Binning
- Binning the range of numeric values
- Create DummyVariables for each bin
X
SNS Followers Ratio
- [ Follower of Each SNS /Total Followers ]
- Calculate Ratio for each SNS platform
X
SNS Followers Re-Grouping
- Private SNS (Twitter, Instagram)
- Page & Channel based SNS (Facebook,Youtube)
- Music based SNS (Soundcloud, Spotify)
X
IMPROVING PERFORMANCE & OPTIMIZATION
Recall Score Improvement
Imbalance Problem
- Number of a Class 1 is far less than Class 0 …
• Classifier measures performance based on “Accuracy”
• Then Classifier biased towards the major classes and hence show very
poor classification rates on minor classes à still shows fairly good
accuracy
- Applied Under Sampling and adjust Class ratio to 50:50
- Trade-off Precision à Focus on Recall Score
Model
Classification Metrics (Class 1)
Comment
Precision Recall F 1 Score AUC
ClusterCentroids 0.45 0.95 0.61 0.86
Random Under Sampler 0.67 0.88 0.89 0.94 ★
CondensedNearestNeighbour 0.61 0.80 0.69 0.92
AllKNN Metrics 0.54 0.89 0.67 0.92
InstanceHardnessThreshold 0.60 0.89 0.72 0.92
NearMiss 0.32 0.95 0.47 0.74
NeighbourhoodCleaningRule 0.60 0.80 0.69 0.93
OneSidedSelection 0.76 0.73 0.75 0.92
TomekLinks Metrics 0.75 0.71 0.73 0.95
Under Sampling : Imblearn Library
Optimization : Scikit-Learn GridSearchCV
• Apply GridSearch Result
• Tune Hyper Parameter
Classification
Report
Precision Recall F1 Score
Class 0 0.98 0.87 0.92
Class 1 0.65 0.93 0.76
Avg /Total 0.91 0.88 0.89
ROC Curve
AUC
TEST NEW DATA !
Youngboy Never Broke Again –
Until Death Call My Name
- Release date : Apr 27th 2018 (Genre : Hiphop)
- Debut Studio Album
* Answer * HIPHOPLE hasn’t produced any
contents for this artist and album as of May 9th
Then … how would the
model predict on this album?
Load Model
Input data from the
artist and album
Class Predict Result:
0 à Ignore this artist and album
Probability Prediction Result:
30% à Low Probability à Low Priority
OH -YEAH
HOW CAN I IMPROVE FURTHER?
1. Research other method to tackle Imbalanced Problem to also increase Precision
2. Try XGBoost Algorithm and tune its best Hyper Parameter
3. Small Dataset
• Accumulate more data for debut albums in future and reflect on model
• Crawl input data from more media press
• Text sentiment analysis would be a plus
4. Dash-based visualization web application
• Automation of Crawling Input Data + Prediction
- Crawl Feature Data à update dataset
- Load Pickle filed model à Predict and return result
List ofTo-Do’s
For more information on this project …
Please visit the links below :)
Github
http://www.github.com/lucaseo
LinkedIn
http://www.linkedin.com/in/lucaseo
Github Blog
http://www.lucaseo.github.io
Please check the links below :D
Github Repository Source Code
https://github.com/lucaseo/content-worth-debut-artist-
classification-project
DataVisualization Web Application
http://le-rookie-clf.herokuapp.com
And about me …
THANKYOU !

More Related Content

What's hot

Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Paul Lamere
 
Alexi Murdoch Press Update
Alexi Murdoch Press UpdateAlexi Murdoch Press Update
Alexi Murdoch Press UpdateCitySlang
 
Last.fm Usability Presentation
Last.fm Usability PresentationLast.fm Usability Presentation
Last.fm Usability PresentationRobert Sherron
 
Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Paul Lamere
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDavid Peterson
 
Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Yi-Hsuan Yang
 
final Pitch Edit
final Pitch Editfinal Pitch Edit
final Pitch Editthe4birds
 
Deezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsDeezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsPlayApp
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
 
A2 music video evaluation 2 the one
A2 music video evaluation 2 the oneA2 music video evaluation 2 the one
A2 music video evaluation 2 the onevgutowski
 
C3 music magazine_treatment
C3 music magazine_treatment C3 music magazine_treatment
C3 music magazine_treatment cassieRix
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findabilitySten Govaerts
 

What's hot (18)

Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)
 
Film pitch
Film pitchFilm pitch
Film pitch
 
Alexi Murdoch Press Update
Alexi Murdoch Press UpdateAlexi Murdoch Press Update
Alexi Murdoch Press Update
 
Last.fm Usability Presentation
Last.fm Usability PresentationLast.fm Usability Presentation
Last.fm Usability Presentation
 
Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig Music
 
Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)
 
final Pitch Edit
final Pitch Editfinal Pitch Edit
final Pitch Edit
 
Deezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsDeezer and Spotify for brands and labels
Deezer and Spotify for brands and labels
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
Music video pitch
Music video pitchMusic video pitch
Music video pitch
 
A2 music video evaluation 2 the one
A2 music video evaluation 2 the oneA2 music video evaluation 2 the one
A2 music video evaluation 2 the one
 
C3 music magazine_treatment
C3 music magazine_treatment C3 music magazine_treatment
C3 music magazine_treatment
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findability
 
Uo march2015 talk
Uo march2015 talkUo march2015 talk
Uo march2015 talk
 
Digipak planning
Digipak planningDigipak planning
Digipak planning
 

Similar to Project overview eng

MusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondMusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondPatriceKelly3
 
Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 22nyce2fail
 
Random Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuRandom Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuPyData
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
 
Music Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit ComparisonMusic Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit Comparisonsongsponge
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Karthik Murugesan
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsAdam Proehl
 
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Aram Sinnreich
 
Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Vegard Sandvold
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsValerio Velardo
 
Nea research task 7 websites
Nea research task 7 websitesNea research task 7 websites
Nea research task 7 websitestwbsmediaconnell
 
Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Stanford University
 
OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep hughes82
 
As evaluations new and improved (1)
As evaluations new and improved (1)As evaluations new and improved (1)
As evaluations new and improved (1)DannyAA
 
Site Search Analytics
Site Search Analytics Site Search Analytics
Site Search Analytics Stefan Thies
 
Lesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedLesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedsandylking
 
Music vid part four rep
Music vid part four repMusic vid part four rep
Music vid part four repsssfcmedia
 

Similar to Project overview eng (20)

MusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondMusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and Beyond
 
Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2
 
Rapfeatures Pitch Deck
Rapfeatures Pitch DeckRapfeatures Pitch Deck
Rapfeatures Pitch Deck
 
Random Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuRandom Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda Uruchurtu
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Bot binder
Bot binderBot binder
Bot binder
 
Music Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit ComparisonMusic Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit Comparison
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & Tools
 
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
 
Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systems
 
A&R Research - The Process
A&R Research - The ProcessA&R Research - The Process
A&R Research - The Process
 
Nea research task 7 websites
Nea research task 7 websitesNea research task 7 websites
Nea research task 7 websites
 
Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020
 
OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep
 
As evaluations new and improved (1)
As evaluations new and improved (1)As evaluations new and improved (1)
As evaluations new and improved (1)
 
Site Search Analytics
Site Search Analytics Site Search Analytics
Site Search Analytics
 
Lesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedLesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept based
 
Music vid part four rep
Music vid part four repMusic vid part four rep
Music vid part four rep
 

Recently uploaded

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 

Recently uploaded (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 

Project overview eng

  • 1. CONTENT-WORTH DEBUT ARTIST CLASSIFIER Wonyoung Lucas Seo
  • 3. HIPHOPLE.com ?????????? - US HIPHOP/R&B based Web Magazine platform - Major Contents - Translation - Lyric Translation( * this is the part I contribute ) - Subtitled MusicVideo - News of HIPHOP / R&B Industry - Magazine - Feature Articles - Artist Interview - Album Review - Fashion / Life Style
  • 4. Project Inspiration - CEO’s needs • Public reputation / preference analysis for merchandise sales • K-Hiphop buzz analysis in South-East Asian region media • Super Rookie Artist prediction analysis • Messenger app chatbot • And many more … Why don’t I see you working these days? I’ve been totally occupied with this machine learning thing Do you have anything in mind that you want to predict or analyze ? I need some inspiration for my project Actually, I do !! This … And that … This would be awesome Yeah and that too … ( CEO )
  • 5. Predict Super Rookie ?? ( … sounds cool. Maybe I should first focus on this one)
  • 6. Let’s predict Super Rookie!!! Setting More Specific Goal … Hiphople Magazine Team produces contents of successful debut artists When producing contents on a debut artist … : (i) Based on the domain/industrial knowledge , editors decide whether or not to produce a content on thie particular rookie artist. (ii) Spend considerable amount of time to research them. (iii) Or loose chance to become the first press to write about the aritst to other press. How can we reduce research time by assigning priority list? & Improve decision making process? Let’s predict whether the rookie artist is worth creating a content & publish / distribute to our subscribers with Machine Learning classification technique based on the quantitative information
  • 7. H I P H O P L E ’s c o n t e n t s = R ev i e w + I n t e r v i e w + Fe a t u re A r t i c l e s + S N S A c c o u n t + @
  • 9. 1. DefineVariables / Input 2. Collect Data & Store AWS MySQL DB Server 3. Labeling TargetVariable Classes 4. ORM Data Query 5. Merging Data 6. Fitting Models 7. Feature Engineering 8. Model CrossValidation and Optimization
  • 11. Binary Classification 1 : Worth-creating a content with this artist 0 : Ignore this artist HIPHOPLE editors participated in labeling process based on their domain knowledge TargetVariable
  • 12. IndependentVariables (Features, Input) Rows : Debut Album released from Jan 1st 2011 to Apr 18th 2018 - Genre - Count of single album released before the official album release - Count of articles published by Music / Culture Media Press - Ratings from Music Media Press - Number of SNS followers of each artist
  • 14. Crawling Data CrawlingTarget Crawling Python Library Debut album by year www.wikipedia.com Requests, BeautifulSoup Ratings www.metacritic.com Scrapy www.pitchfork.com Requests www.albumoftheyear.com Requests, BeautifulSoup SNS followers of each artist www.chartmetric.io Selenium Count of related article www.billboard.com Requests, BeautifulSoup www.genius.com Requests, BeautifulSoup www.thesource.com Requests, BeautifulSoup www.xxlmag.com Requests, BeautifulSoup Data Crawling à Stored in AWS EC2 MySQL DB
  • 16. • Editors participated in labeling the class as 1 , 0 • Labeling sheet was shared via Google Spreadsheet Labeling • If you had created any contents on this artist/album please label it as “ 1 ” in roman numerals • If you don’t remember but still think its worth creating contents, please label it as “ 1 ” • If you didn’t created any contents related to this artist/album, please label it as “ 0 ”
  • 18. Data size : 1083 Rows - Label ( TargetVariable) Features : 15 Aritcle Counts - Billboard - Genius - The Source - XXL Magazine Album Info - Genre - Count of Single Albums SNS - Twitter followers - Instagram followers - Facebook page Likes - Youtube Channel subscribers - Soundcloud followers - Spotify followers Ratings - Album of theYear - Pitchfork - Metacritic
  • 20. Model Classification Metrics (Class 1) Comment Precision Recall F 1 Score AUC KNN : K-Nearest Neighbours 0.79 0.43 0.55 0.83 SGD : Steepest Gradient Decent 0.31 0.92 0.46 0.66 Decision Tree 0.62 0.65 0.63 0.86 Random Forest 0.74 0.67 0.70 0.91 ★ Gradient Boosting 0.69 0.57 0.62 0.93 XGBoost : Extra Gradient Boosting 0.78 0.64 0.70 0.93 Classification Algorthms: Scikit-Learn Libraries
  • 21. Reason for selecting Random Forest Classifier Recall (Sensitivity) was major consideration for this project - The impact of False Positive ( FP ) in case of considering Precision • Incorrect prediction : answered Class 1 (prediction) to actual Class 0 data (Real) • Create content even though the artist & album not valued. • Spend more time and energy on incorrect class à Wasted internal resources - The impact of False Negative ( FN ) in case of considering Recall • Incorrect prediction : answered Class 0 (prediction) to actual Class 1 data (Rea) • Ignore the artist & album that we shouldn’t have ignored • Increase possibility of other media creating content first hand • Increase possibility of losing potential subscribers • Increase possibility of losing opportunity of arranging management contract with the artist for management in South Korea
  • 23. Feature EngineeringTrials Feature Type Descriptions Performance Improvement Genre One-Hot Encoding Create DummyVariable with One-Hot Encoding O Ratings NullValue Imputation Fill in “ 0 ” rating score for the album that didn’t received any ratings O Buzz Rating SNS Followers Binning - Binning the range of numeric values - Create DummyVariables for each bin X SNS Followers Ratio - [ Follower of Each SNS /Total Followers ] - Calculate Ratio for each SNS platform X SNS Followers Re-Grouping - Private SNS (Twitter, Instagram) - Page & Channel based SNS (Facebook,Youtube) - Music based SNS (Soundcloud, Spotify) X
  • 24. IMPROVING PERFORMANCE & OPTIMIZATION
  • 25. Recall Score Improvement Imbalance Problem - Number of a Class 1 is far less than Class 0 … • Classifier measures performance based on “Accuracy” • Then Classifier biased towards the major classes and hence show very poor classification rates on minor classes à still shows fairly good accuracy - Applied Under Sampling and adjust Class ratio to 50:50 - Trade-off Precision à Focus on Recall Score
  • 26. Model Classification Metrics (Class 1) Comment Precision Recall F 1 Score AUC ClusterCentroids 0.45 0.95 0.61 0.86 Random Under Sampler 0.67 0.88 0.89 0.94 ★ CondensedNearestNeighbour 0.61 0.80 0.69 0.92 AllKNN Metrics 0.54 0.89 0.67 0.92 InstanceHardnessThreshold 0.60 0.89 0.72 0.92 NearMiss 0.32 0.95 0.47 0.74 NeighbourhoodCleaningRule 0.60 0.80 0.69 0.93 OneSidedSelection 0.76 0.73 0.75 0.92 TomekLinks Metrics 0.75 0.71 0.73 0.95 Under Sampling : Imblearn Library
  • 27. Optimization : Scikit-Learn GridSearchCV • Apply GridSearch Result • Tune Hyper Parameter Classification Report Precision Recall F1 Score Class 0 0.98 0.87 0.92 Class 1 0.65 0.93 0.76 Avg /Total 0.91 0.88 0.89 ROC Curve AUC
  • 29. Youngboy Never Broke Again – Until Death Call My Name - Release date : Apr 27th 2018 (Genre : Hiphop) - Debut Studio Album * Answer * HIPHOPLE hasn’t produced any contents for this artist and album as of May 9th Then … how would the model predict on this album?
  • 30. Load Model Input data from the artist and album Class Predict Result: 0 à Ignore this artist and album Probability Prediction Result: 30% à Low Probability à Low Priority OH -YEAH
  • 31. HOW CAN I IMPROVE FURTHER?
  • 32. 1. Research other method to tackle Imbalanced Problem to also increase Precision 2. Try XGBoost Algorithm and tune its best Hyper Parameter 3. Small Dataset • Accumulate more data for debut albums in future and reflect on model • Crawl input data from more media press • Text sentiment analysis would be a plus 4. Dash-based visualization web application • Automation of Crawling Input Data + Prediction - Crawl Feature Data à update dataset - Load Pickle filed model à Predict and return result List ofTo-Do’s
  • 33. For more information on this project … Please visit the links below :) Github http://www.github.com/lucaseo LinkedIn http://www.linkedin.com/in/lucaseo Github Blog http://www.lucaseo.github.io Please check the links below :D Github Repository Source Code https://github.com/lucaseo/content-worth-debut-artist- classification-project DataVisualization Web Application http://le-rookie-clf.herokuapp.com And about me …