SlideShare a Scribd company logo
Building
Recommendation
Products 101
Esh Kumar
Machine Learning & Data Products @ Spotify NYC
@eshvk
Who am I?
• UT Austin Machine Learning
• Building Recommendation Systems @ Mozilla,
StumbleUpon & Spotify
Products @ Spotify
•Discover … to find new albums
•Discover Weekly … A weekly Playlist
•Editorial Playlist Recommendations
•Radio
Products @ StumbleUpon
• Content extraction and recommendation
pipelines.
• Mobile Recommendations.
Products @ Mozilla
• Grouperfish: Generalized Clustering of
large scale text.
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
ML
Content
User
Product Personalization
• Machine Learning does not trump a bad idea. 

• Idea -> Data Driven Product Development -> ML
(More like design than coding)
ML
Content
User
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://musicmachinery.com/2014/02/10/gender-
specific-listening/)
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://musicmachinery.com/2014/02/13/age-specific-
listening/)
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(latimes.com)
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(https://research.google.com/bigpicture/music/)
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://www.theverge.com/2012/3/18/2882372/netflix-recommended-genres-list)
Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
• CF
30 Million Songs…
WhatTo Play?
75 Million People … 1 Person Every 3 Secs…
Recommendation Systems
• Predict user response to options.
• Rich field: Matrix completion, ranking, text models,
latent factor models.
• Several conferences annually. RecSys, NIPS, ICML etc
• Industry researchers include NFLX, GOOG, MS and
more…
Similarity
Our problem is to figure out how similar two
items are.
Mathematically, this means modeling a function
Similarity(x,y) for all users and items, if possible.
Collaborative Filtering
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Model you based on songs you played…
Predict your future based on similar users…
Millions of users and billions of streams…
…. so there is someone like you out there
Collaborative Filtering
The Netflix Prize.
A million dollars for beating NFLX’s
best algorithms by ~ 10%.
Neighborhood Models
The Amazon approach…
Matrix Completion
Matrix Completion. A matrix expresses a system. We model the
data in the form of a matrix. For example, play counts for all songs
and all users could be:
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
Matrix Completion is well studied …
Start with random vectors around the origin. Run alternating least
squares or gradient descent or stochastic gradient descent… All this
is Hadoopable™.
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
Hands On Coding…
Please point your browser to:



https://github.com/eshwaran/recs101workshop
30 Million Songs…
WhatTo Play?
75 Million People … 1 Person Every 3 Secs…
1.5 Billion Playlists
Language Models
• Language models work well too. For example,
a playlist could be considered as a document
and you could learn the latent vectors for tracks
(words).
• Then represent a User as a linear combination
of their Tracks.
word2vec
Words with similar contexts have similar
meaning
word2vec
word2vec
Target Word
Context Word
word2vec
Target Words and Corresponding Contexts
shining bright trees dark green
stars 61 50 10 30 1
sun 71 60 5 2 0
cucumber 2 1 15 3 40
word2vec
Playlists CPU Vectors
Read GetVectors & Update
The Record Store…
The List Maker …
How do you scale this?
Tools of the trade
• Build models in Python.
• Jobs in Scalding + Luigi ( https://github.com/spotify/luigi )
• Storm for real time.
• In house RPC for serving requests.
General Tips
• Analyze, prototype and then build.
• Simpler algorithms are easier to test than harder ones.
• Data Science is more art than science. Employthe laugh test of
evaluating your results.
Join the band!
• Machine Learning, Data & Backend Gigs.
• Now touring in New York, Boston & Stockholm!
Thanks !
Esh Kumar
@eshvk

More Related Content

Similar to Recommendations 101

Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
Esh Vckay
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
Ramzi Karam
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
Oscar Carlsson
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
Rudradeb Mitra
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
Nikhil Tibrewal
 
Intelligent Search
Intelligent SearchIntelligent Search
Intelligent Search
MapR Technologies
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
NAVER D2
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
multimediaeval
 
LyriSys: An Interactive Support System for Writing Lyrics Based on Topic Tr...
LyriSys:  An Interactive Support System  for Writing Lyrics Based on Topic Tr...LyriSys:  An Interactive Support System  for Writing Lyrics Based on Topic Tr...
LyriSys: An Interactive Support System for Writing Lyrics Based on Topic Tr...
Kento Watanabe
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
Shifa Khan
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
Erik Bernhardsson
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers Workshop
Paul Boos
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum Radar
Panos Gemos
 
lastfm contentdashboards project description
lastfm contentdashboards project descriptionlastfm contentdashboards project description
lastfm contentdashboards project description
Gaurav Bhardwaj
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
Amazon Web Services
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
Stuart Wrigley
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
WGBH Media Library and Archives
 
Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices
Senturus
 

Similar to Recommendations 101 (20)

Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Intelligent Search
Intelligent SearchIntelligent Search
Intelligent Search
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
LyriSys: An Interactive Support System for Writing Lyrics Based on Topic Tr...
LyriSys:  An Interactive Support System  for Writing Lyrics Based on Topic Tr...LyriSys:  An Interactive Support System  for Writing Lyrics Based on Topic Tr...
LyriSys: An Interactive Support System for Writing Lyrics Based on Topic Tr...
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers Workshop
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum Radar
 
lastfm contentdashboards project description
lastfm contentdashboards project descriptionlastfm contentdashboards project description
lastfm contentdashboards project description
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 

Recommendations 101

  • 2. Esh Kumar Machine Learning & Data Products @ Spotify NYC @eshvk
  • 3. Who am I? • UT Austin Machine Learning • Building Recommendation Systems @ Mozilla, StumbleUpon & Spotify
  • 4. Products @ Spotify •Discover … to find new albums •Discover Weekly … A weekly Playlist •Editorial Playlist Recommendations •Radio
  • 5. Products @ StumbleUpon • Content extraction and recommendation pipelines. • Mobile Recommendations.
  • 6. Products @ Mozilla • Grouperfish: Generalized Clustering of large scale text.
  • 7. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based ML Content User
  • 8. Product Personalization • Machine Learning does not trump a bad idea. 
 • Idea -> Data Driven Product Development -> ML (More like design than coding) ML Content User
  • 9. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP
  • 10. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based (http://musicmachinery.com/2014/02/10/gender- specific-listening/)
  • 11. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based (http://musicmachinery.com/2014/02/13/age-specific- listening/)
  • 12. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP • Manually tag attributes • Curation
  • 13. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based (latimes.com)
  • 14. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based (https://research.google.com/bigpicture/music/)
  • 15. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based (http://www.theverge.com/2012/3/18/2882372/netflix-recommended-genres-list)
  • 16. Product Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP • Manually tag attributes • Curation • CF
  • 17. 30 Million Songs… WhatTo Play? 75 Million People … 1 Person Every 3 Secs…
  • 18. Recommendation Systems • Predict user response to options. • Rich field: Matrix completion, ranking, text models, latent factor models. • Several conferences annually. RecSys, NIPS, ICML etc • Industry researchers include NFLX, GOOG, MS and more…
  • 19. Similarity Our problem is to figure out how similar two items are. Mathematically, this means modeling a function Similarity(x,y) for all users and items, if possible.
  • 20. Collaborative Filtering Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Model you based on songs you played… Predict your future based on similar users… Millions of users and billions of streams… …. so there is someone like you out there
  • 21. Collaborative Filtering The Netflix Prize. A million dollars for beating NFLX’s best algorithms by ~ 10%.
  • 23. Matrix Completion Matrix Completion. A matrix expresses a system. We model the data in the form of a matrix. For example, play counts for all songs and all users could be: Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Call Me Maybe Esh Esh listened to call me maybe once… ⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn
  • 24. Matrix Completion is well studied … Start with random vectors around the origin. Run alternating least squares or gradient descent or stochastic gradient descent… All this is Hadoopable™. Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Call Me Maybe Esh Esh listened to call me maybe once… ⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn
  • 25. Hands On Coding… Please point your browser to:
 
 https://github.com/eshwaran/recs101workshop
  • 26. 30 Million Songs… WhatTo Play? 75 Million People … 1 Person Every 3 Secs…
  • 28. Language Models • Language models work well too. For example, a playlist could be considered as a document and you could learn the latent vectors for tracks (words). • Then represent a User as a linear combination of their Tracks.
  • 29. word2vec Words with similar contexts have similar meaning
  • 32. word2vec Target Words and Corresponding Contexts shining bright trees dark green stars 61 50 10 30 1 sun 71 60 5 2 0 cucumber 2 1 15 3 40
  • 33. word2vec Playlists CPU Vectors Read GetVectors & Update
  • 34. The Record Store… The List Maker … How do you scale this?
  • 35. Tools of the trade • Build models in Python. • Jobs in Scalding + Luigi ( https://github.com/spotify/luigi ) • Storm for real time. • In house RPC for serving requests.
  • 36. General Tips • Analyze, prototype and then build. • Simpler algorithms are easier to test than harder ones. • Data Science is more art than science. Employthe laugh test of evaluating your results.
  • 37. Join the band! • Machine Learning, Data & Backend Gigs. • Now touring in New York, Boston & Stockholm!