Recommending Actions,Not Content       david @ayman shamma         internet experiences   microeconomics & social systems
Internet Experiences Group(David)
Ayman
   Shamma        Lyndon
Kennedy   Jude
Yew   Elizabeth
Churchill
Disclaimer: I !<3 Recommendation Systems
Disclaimer: I !<3 Recommendation Systems           I <3 Engagement
#FAIL
#FAIL
Really? What are we doing?
Really? What are we doing?What are we recommending?Why are we doing that?
Image
Search
Image
Search
Image
Search   A
Text
               Box!!!!
Search as Recommendation
Play
the
 Music
Click
Through
on
Search
Pages                                                          BoIom
of
                          ...
Is Recent a Relevant Recommendation?
RecentNormal
Does Relevance Matter?• Bottom of the page   – Normally low click through   – Show alternate results
Does Relevance Matter?• Bottom of the page   – Normally low click through   – Show alternate results
Does Relevance Matter?• Bottom of the page   – Normally low click through                     G             WRON   – Show ...
Does Relevance Matter?• Bottom of the page              Precision/recall   – Normally low click through                   ...
Un-related images at the bottom of the page                       should be here.                                         ...
Un-related images at the bottom of the page                         are here!!!                                           ...
What’s
Similar?
Have
a
listen.    Song
1            Song
2             Song
3                                 32
Song 1
Song 1
Song 2
Song 2
Song 3
Song 3
Context        SongRater
Context        SongRater
Context        SongRater
So
what
do
you
like?Song
1            Song
2         Song
3                           37
Song 1
Song 1
Song 2
Song 2
Song 3
Song 3
Think about ratings
Song
Similarity
Example                  Song 1   Song 2   Song 3    Jazz Lover      5        0        5   Rock Lover     ...
Song
Similarity
Example                  Song 1    Song 2      Song 3    Jazz Lover      5          0          5   Rock Lo...
A
Small
Experiment
(by
M.
Slaney)• 380,911
Subjects• 1000
Jazz
Songs• 1,449,335
Rabngs  Never
Play
this
Again   Love
It!
Users
do
not
rate
everything…. Self‐Selected
Rabng
Histogram                                            True
Rabng
Histogr...
About
the
Data






             46
About
the
Data






• Real
rabng
data  – Y!
Music                       Y!
Data  – 700M
rabngs                           ...
About
the
Data






• Real
rabng
data  – Y!
Music                       Y!
Data  – 700M
rabngs                       True...
About
the
Data






• Real
rabng
data  – Y!
Music                       Y!
Data  – 700M
rabngs                           ...
Neilix
Compebbon• Create
new
recommendabon
algorithm  – 10%
beIer
than
Neilix
algorithm• Data  – 100M
rabngs  – 480k
users...
Movie
rabng
data                                Training
data              Test
data• Training
data            score     m...
Components
of
a
rabng
predictor       user
bias                movie
bias               user‐movie
interacbon             ...
This is kinda why we are here...
Legacy Video
Traditional Comments and TagsLeft in Whole, Unattached.
Quickly...let me tell you why I hate tags...
Tag this.
Tag this.
Tag This
Tag
Noise
Who’s
Christmas?Canada                Australia
Hey aren’t categories tags anyhow?
Double Rainbow   Pick a category
Anyway, back on track...
Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
Social Conversations happen around videosWell – actually people join in a session and converse afterwards.
What to Collect to measure• Type of event  (Zync player command or a normal chat message)• Anonymous hash  (uniquely ident...
A Short Movie
Percent of actions over time.
Chat follows the video!                      CHAT
http://www.flickr.com/photos/wvs/3833148925/
Reciprocity• 43.6% of the sessions the invitee played at  least one video back to the session’s initiator.• 77.7% sharing ...
How do we know what people are watching?How can we give them better things to watch?CLASSIFICATION
Types of features on YouTube
5 star ratings has been the golden egg for recommendation systemsso far; implicit human cooperative sharing activity works...
20 random videos sent to 43 people.60.3% identified the category correctly.52.3% identified the comedies correctly.PEOPLE ...
Used and Unused DataYou Tube              ZyncDuration (video)      Duration (session)*Views (video)Duration              ...
Phone in your favorite ML technique.FIRST ORDER DATA WASN’TPRETTY
Naïve Bayes Classification  Type                        Accuracy  Random Chance                 23.0%  You Tube Features  ...
What about these three videos? Which one you like?Nominal Factorization
Ratings doen’t particularly specify order.Nominal Factorization
Classification with Factoring   Type                                                         Accuracy   Random Chance     ...
Classification w/ Zync features    Type                                                   Accuracy    Random Chance       ...
Finding the viral.Can we predict if a video has over 10M views?More so, can we do so with say 10 people across 5 sessions?
Remember this is what    we have for data
Viral Classification w/ Zync features    Does the video have over 10 M views?   Accuracy    Guessing Yes                  ...
Three pieces              ClassifierSurvey Data                Interviews
Audience Perception                      Just ask Homer             is Key
I !<3 Recommendation Systems
3 areas prime for social recommendation for disrupt:
1: Understanding the temporal and the recent.
Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
Come see my talk!
Lets find a moment   Here’s an example.
All Tweets        Inauguration TweetsLeft: All tweet sample.Right: Tweets with Inauguration keywords.
All Tweets                  Inauguration Tweets   All Tweets with @Left: All tweet sample.Right: Tweets with Inauguration ...
12:04 is what you want to                  watch.
2: Q & A
Likes                                          Generalization Questioningthe Question                                     ...
3: Challenges
Me: You’re in China, go to the night market for   !!
Me: You’re in China, go to the night market for           !!My friend: Street food? Are you kidding? I’ll get sick!
Me: You’re in China, go to the night market for           !!My friend: Street food? Are you kidding? I’ll get sick!Me: I d...
Me: You’re in China, go to the night market for     !!You: Street food? Are you kidding? I’ll get sick!Me: I dare you not ...
Man vs. Food   http://www.travelchannel.com/TV_Shows/               Man_V_Food
Why try to understand engagement?                               Better advertising.     Better understanding of the relati...
Find me: @ayman • aymans@acm.org                                           Fin & Thanks!Thanks to D. DuBois, M. Slaney, E....
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Recommending Actions, Not Content
Upcoming SlideShare
Loading in...5
×

Recommending Actions, Not Content

1,497

Published on

My Keynote from the Social Recommendation Systems workshop for CSCW2011.

Published in: Technology
2 Comments
4 Likes
Statistics
Notes
  • Thanks! One third of the deck (the stuff on music and image search) came from my colleague Malcolm Slaney: http://research.yahoo.com/Malcolm_Slaney
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This may be the best presentation and encapsulation of issues around recommendations and ratings I have run across so far, and I've been looking for a long time. All the solid relevant works all pulled into one preso that can help give better understanding to others. Brilliant and thank you!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,497
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide
  • here are my notes\n\n
  • There are many of us, but this is the work of three.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • are we that bad at this?\n
  • are we that bad at this?\n
  • These verbs have us trapped in 1998&amp;#x2026;oh ya and the anti-flash silliness doesn&amp;#x2019;t help.\n
  • These verbs have us trapped in 1998&amp;#x2026;oh ya and the anti-flash silliness doesn&amp;#x2019;t help.\n
  • Recommendation buys us the ability to discover (search) without text.\n
  • \n
  • \n
  • \n
  • Adapted from &amp;#x201C;A Dynamic Bayesian Network Click Model for Web Search Ranking,&amp;#x201D; by Olivier Chapelle, Ya Zhang, WWW&amp;#x2019;09.\n
  • \n
  • \n
  • Side bar of related people\n
  • \n
  • \n
  • \n
  • Adapted from &amp;#x201C;A Dynamic Bayesian Network Click Model for Web Search Ranking,&amp;#x201D; by Olivier Chapelle, Ya Zhang, WWW&amp;#x2019;09.\n
  • Adapted from &amp;#x201C;A Dynamic Bayesian Network Click Model for Web Search Ranking,&amp;#x201D; by Olivier Chapelle, Ya Zhang, WWW&amp;#x2019;09.\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • In a study I performed a few years ago we compared two different approaches for judging music similarity [Slaney and White]. &amp;#xA0;In the classic approach we use music features, often used to judge genre. &amp;#xA0;The assumption is that if these features are good for making genre judgements, then they will also tell us something about similarity. &amp;#xA0;This feature is known as a genregram [Tsanatakis]. &amp;#xA0;The content is rich---it tells us everything we need to know about the music. &amp;#xA0;In fact, listeners can tell whether they like a radio station within seconds of changing the dial.\n\n
  • In a study I performed a few years ago we compared two different approaches for judging music similarity [Slaney and White]. &amp;#xA0;In the classic approach we use music features, often used to judge genre. &amp;#xA0;The assumption is that if these features are good for making genre judgements, then they will also tell us something about similarity. &amp;#xA0;This feature is known as a genregram [Tsanatakis]. &amp;#xA0;The content is rich---it tells us everything we need to know about the music. &amp;#xA0;In fact, listeners can tell whether they like a radio station within seconds of changing the dial.\n\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • Bagpipes from: http://www.weddingbagpipes.com/ \nBeethoven Orchestral Ode to Joy from Various (Walt Disney Records)/\nClassical Silly Songs\nAlong with the Mozart (Symphony No. 40)\n
  • \n
  • The alternative is an item-to-item judgement based on user ratings. &amp;#xA0;The idea considers each song as a point in a multidimensional space defined by a user&apos;s rating of the song. &amp;#xA0;On a 5-point scale, this is just 2.2 bits of information! &amp;#xA0;If a jazz lover, a rock lover, and a hip-hop lover all give two songs the same rating, then the two songs are probably quite similar.\n\n
  • The alternative is an item-to-item judgement based on user ratings. &amp;#xA0;The idea considers each song as a point in a multidimensional space defined by a user&apos;s rating of the song. &amp;#xA0;On a 5-point scale, this is just 2.2 bits of information! &amp;#xA0;If a jazz lover, a rock lover, and a hip-hop lover all give two songs the same rating, then the two songs are probably quite similar.\n\n
  • In our study, we used the ratings by XXX listeners of 1000 different songs. After adjusting for missing data, we formed a vector of all user ratings for each song. &amp;#xA0;Song similarity was defined as the correlation between the user-rating vectors for the two songs.\n\n
  • We initially expected that a bias of 50% would be best. This means that strong likes and dislikes would be equally important. \n\nBut user&amp;#x2019;s don&amp;#x2019;t rate everything. Left, summary of 717M user ratings. Right 35k users, rating 10 songs at random.\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • We tested the two song-similarity approaches by starting with a seed song and forming playlists. &amp;#xA0;In a blind test, user&apos;s overwhelmingly said that the songs on the playlist based on rating data were more similar to each other than those based on the genre space, or a random selection of songs. &amp;#xA0;How can this be? &amp;#xA0;Just 2.2 bits beat out a state-of-the-art system based on content.\n\nProblem: How do we figure out the semantics of media signals? We can do simple problems like ASR and OCR. This is the holy grail of image analysis. We want to solve the problem when we have some information about the signal (like a caption).\n\nProblem: How do we describe the time course of a podcast, a musical signal, or a movie? What parts are similar to each other? How do we pick out the most salient portions? How do we segment?\n
  • Netflix recently hosted a one million dollar competition to find a better recommendation system for their movies. &amp;#xA0;It is not an understatement to say that it captured the entire machine-learning community&apos;s interest. &amp;#xA0;Thousands of hours of research, in all different directions, were directed at this problem.\n\nWhile the identity of the users was unknown, the movie titles were not. &amp;#xA0;Researchers quickly identified each movie and analyzed their content. &amp;#xA0;It only makes sense that Alice, who loves romance movies, will like very different content from Bob, who wants action films. &amp;#xA0;We should be able to use this information to build a better recommendation system.\n\n
  • But alas, content didn&apos;t help! &amp;#xA0;The winning systems included every possible signal [Koren, Y]. &amp;#xA0;One that surprised me was that the amount of time between the movie&apos;s release and the user&apos;s rating. &amp;#xA0;Evidently there is a strong correlation, with older movies getting a higher rating. &amp;#xA0;All available signals were combined using a boosting. &amp;#xA0;In boosting various (weak) classifiers are combined to make a prediction (the movie&apos;s rating by a new user) &amp;#xA0;if they reduce the error on an unseen test data set. &amp;#xA0;Dozens of different features were included.\n\n\n
  • Not a single feature was derived from the movie&apos;s content! &amp;#xA0;These were well-motivated researchers, with access to the best of the algorithms in the multimedia literature. But we couldn&apos;t help them. &amp;#xA0;Arguably, the movie&apos;s genre was reflected in the rating data. &amp;#xA0;But in the end the FFT lost to *****&apos;s.\n\n
  • \n
  • Transactional. There is MORE to tagging and comments in social media than how we think of it currently as the single browser/site/startup.\n
  • These tags and comments are regulated to anchored explicit annotation. This is the problem. Temporally, there is a gap &amp;#x2013; we cannot leverage these components like we have with photos. Some tags and notes are added as deep annotation, but that&amp;#x2019;s rare.\n
  • \n
  • Notre Dame!\n
  • Augsburg Cathedral\n
  • Australia\n
  • All tagged Christmas\n
  • Likewise, the context of an image tells us a LOT about what might be in the image. &amp;#xA0;We like to treat multimedia classification as a simple problem---here is an image, does it show a telephone box? &amp;#xA0;But in the real world every piece of content has a history. &amp;#xA0;At the very least we know it was shot by a real person (or a real person owned the camera.) &amp;#xA0;The image was uploaded to a web site, and each web site has a flavor. &amp;#xA0;Photos on the ESPN web site are very different from those at TMZ. &amp;#xA0;Photos uploaded to Flickr (tm) are often more artistic than the people shots typical on Facebook. &amp;#xA0;Even more finely, the friends of a person who takes lots of pictures of cats, will probably have friends who like and take pictures of cats.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • http://www.flickr.com/photos/wvs/3833148925/\n\nThis is a three part talk where I&amp;#x2019;ll discuss IM, Chatrooms, and Twitter.\n
  • Gift giving at its finest\n
  • \n
  • \n
  • \n
  • \n
  • So we started looking at classification based on two datasets YouTube and Zync. Each is about 5000 videos (or sessions).\n
  • I come from a strong AI family&amp;#x2026;so I don&amp;#x2019;t wanna get too into it&amp;#x2026;\n
  • \n
  • So we started to think about what the data was saying to us&amp;#x2026;\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Triangulate between the classifier results, the survey results and the interviews:\n Determine whether the Na&amp;#xEF;ve Bayes classifier or humans are better at determining whether a video belongs to the &amp;#x201C;comedy&amp;#x201D; genre.\n Determine if the &amp;#x201C;ground truth&amp;#x201D; genre categories provided by the original uploader is reliable.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • A dare is my favorite type of social recommandation\n
  • A dare is my favorite type of social recommandation\n
  • A dare is my favorite type of social recommandation\n
  • A dare is my favorite type of social recommandation\n
  • 72-oz. steak\n
  • \n
  • Conversational \n
  • Transcript of "Recommending Actions, Not Content"

    1. 1. Recommending Actions,Not Content david @ayman shamma internet experiences microeconomics & social systems
    2. 2. Internet Experiences Group(David)
Ayman
 Shamma Lyndon
Kennedy Jude
Yew Elizabeth
Churchill
    3. 3. Disclaimer: I !<3 Recommendation Systems
    4. 4. Disclaimer: I !<3 Recommendation Systems I <3 Engagement
    5. 5. #FAIL
    6. 6. #FAIL
    7. 7. Really? What are we doing?
    8. 8. Really? What are we doing?What are we recommending?Why are we doing that?
    9. 9. Image
Search
    10. 10. Image
Search
    11. 11. Image
Search A
Text
 Box!!!!
    12. 12. Search as Recommendation
    13. 13. Play
the
 Music
    14. 14. Click
Through
on
Search
Pages BoIom
of
 “fold” BoIom
of
 Page BoIom
of
 2nd
WindowAdapted
from
“A
Dynamic
Bayesian
Network
Click
Model
for
Web
Search
Ranking,”
by
Olivier
Chapelle,
Ya
Zhang,
WWW’09.
    15. 15. Is Recent a Relevant Recommendation?
    16. 16. RecentNormal
    17. 17. Does Relevance Matter?• Bottom of the page – Normally low click through – Show alternate results
    18. 18. Does Relevance Matter?• Bottom of the page – Normally low click through – Show alternate results
    19. 19. Does Relevance Matter?• Bottom of the page – Normally low click through G WRON – Show alternate results
    20. 20. Does Relevance Matter?• Bottom of the page Precision/recall – Normally low click through G doesn’t (always) WRON matter!! – Show alternate results (for multimedia)
    21. 21. Un-related images at the bottom of the page should be here. BoIom
of
 “fold” BoIom
of
 Page BoIom
of
 2nd
WindowAdapted
from
“A
Dynamic
Bayesian
Network
Click
Model
for
Web
Search
Ranking,”
by
Olivier
Chapelle,
Ya
Zhang,
WWW’09.
    22. 22. Un-related images at the bottom of the page are here!!! BoIom
of
 “fold” BoIom
of
 Page BoIom
of
 2nd
WindowAdapted
from
“A
Dynamic
Bayesian
Network
Click
Model
for
Web
Search
Ranking,”
by
Olivier
Chapelle,
Ya
Zhang,
WWW’09.
    23. 23. What’s
Similar?
Have
a
listen. Song
1 Song
2 Song
3 32
    24. 24. Song 1
    25. 25. Song 1
    26. 26. Song 2
    27. 27. Song 2
    28. 28. Song 3
    29. 29. Song 3
    30. 30. Context SongRater
    31. 31. Context SongRater
    32. 32. Context SongRater
    33. 33. So
what
do
you
like?Song
1 Song
2 Song
3 37
    34. 34. Song 1
    35. 35. Song 1
    36. 36. Song 2
    37. 37. Song 2
    38. 38. Song 3
    39. 39. Song 3
    40. 40. Think about ratings
    41. 41. Song
Similarity
Example Song 1 Song 2 Song 3 Jazz Lover 5 0 5 Rock Lover 5 0 5Classical Lover 0 5 0
    42. 42. Song
Similarity
Example Song 1 Song 2 Song 3 Jazz Lover 5 0 5 Rock Lover 5 0 5Classical Lover 0 5 0 Similar
Songs
    43. 43. A
Small
Experiment
(by
M.
Slaney)• 380,911
Subjects• 1000
Jazz
Songs• 1,449,335
Rabngs Never
Play
this
Again Love
It!
    44. 44. Users
do
not
rate
everything…. Self‐Selected
Rabng
Histogram True
Rabng
Histogram (1.5B
rabngs) (350k
rabngs)From:
Marlin,
Zemel,
Roweis,
Slaney.
“Collaborabve
Filtering
and
the
missing
at
random
assumpbon.”
UAI
2007
    45. 45. About
the
Data






 46
    46. 46. About
the
Data






• Real
rabng
data – Y!
Music Y!
Data – 700M
rabngs 46
    47. 47. About
the
Data






• Real
rabng
data – Y!
Music Y!
Data – 700M
rabngs True
Distribubon 46
    48. 48. About
the
Data






• Real
rabng
data – Y!
Music Y!
Data – 700M
rabngs 
 d 
of o True
Distribubon l iho ike ing L y pla 46
    49. 49. Neilix
Compebbon• Create
new
recommendabon
algorithm – 10%
beIer
than
Neilix
algorithm• Data – 100M
rabngs – 480k
users,
17k
movies• Winner – BellKorPragmabcChaos
    50. 50. Movie
rabng
data Training
data Test
data• Training
data score movie user movie user – 100
million
rabngs 1 21 1 ? 62 1 5 213 1 ? 96 1 – 480,000
users 4 345 2 ? 7 2 – 17,770
movies 4 123 2 ? 3 2 – 6
years
of
data:
 3 768 2 ? 47 3 2000‐2005 5 76 3 ? 15 3• Test
data 4 45 4 ? 41 4 – Last
few
rabngs
of
 1 568 5 ? 28 4 each
user
(2.8
 2 342 5 ? 93 5 million) 2 234 5 ? 74 5• Dates
of
rabngs
are
 5 76 6 ? 69 6 given 4 56 6 ? 83 6
    51. 51. Components
of
a
rabng
predictor user
bias movie
bias user‐movie
interacbon Baseline
predictor User‐movie
interacbon • Separates
users
and
movies • Characterizes
the
matching
 • Onen
overlooked
 between
users
and
movies • Benefits
from
insights
into
users’
 • AIracts
most
research
in
the
field behavior • Benefits
from
algorithmic
and
 • Among
the
main
pracbcal
 contribubons
of
the
compebbon mathemabcal
innovabonsCourtesy
of
YehudaKoren
    52. 52. This is kinda why we are here...
    53. 53. Legacy Video
    54. 54. Traditional Comments and TagsLeft in Whole, Unattached.
    55. 55. Quickly...let me tell you why I hate tags...
    56. 56. Tag this.
    57. 57. Tag this.
    58. 58. Tag This
    59. 59. Tag
Noise
    60. 60. Who’s
Christmas?Canada Australia
    61. 61. Hey aren’t categories tags anyhow?
    62. 62. Double Rainbow Pick a category
    63. 63. Anyway, back on track...
    64. 64. Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
    65. 65. Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
    66. 66. Social Conversations happen around videosWell – actually people join in a session and converse afterwards.
    67. 67. What to Collect to measure• Type of event (Zync player command or a normal chat message)• Anonymous hash (uniquely identifies the sender and the receiver, without exposing personal account data)• URL to the shared video• Timestamp for the event• The player time (with respect to the specific video) at the point the event occurred• The number of characters and the number words typed (for chat messages)• Emoticons used in the chat message
    68. 68. A Short Movie
    69. 69. Percent of actions over time.
    70. 70. Chat follows the video! CHAT
    71. 71. http://www.flickr.com/photos/wvs/3833148925/
    72. 72. Reciprocity• 43.6% of the sessions the invitee played at least one video back to the session’s initiator.• 77.7% sharing reciprocation• Pairs of people often exchanged more than one set of videos in a session.• In the categories of Nonprofit, Technology and Shows, the invitees shared more videos
    73. 73. How do we know what people are watching?How can we give them better things to watch?CLASSIFICATION
    74. 74. Types of features on YouTube
    75. 75. 5 star ratings has been the golden egg for recommendation systemsso far; implicit human cooperative sharing activity works better.CLASSIFICATION BASED ONIMPLICIT CONNECTED SOCIAL
    76. 76. 20 random videos sent to 43 people.60.3% identified the category correctly.52.3% identified the comedies correctly.PEOPLE REALLY STINK AT THIS
    77. 77. Used and Unused DataYou Tube ZyncDuration (video) Duration (session)*Views (video)Duration # of Play/Pause* Duration (session)*Rating*Views # of Scrubs* # of Play/Pause*Rating* # of Chats* # of Scrubs*You Tube (not used) Zync (not used)Tags EmoticonsComments User ID dataFavorites # of Sessions # of Loads
    78. 78. Phone in your favorite ML technique.FIRST ORDER DATA WASN’TPRETTY
    79. 79. Naïve Bayes Classification Type Accuracy Random Chance 23.0% You Tube Features 14.6% You Tube Top 5 Categories 32.4% Zync Features 53.9% Humans 60.9%
    80. 80. What about these three videos? Which one you like?Nominal Factorization
    81. 81. Ratings doen’t particularly specify order.Nominal Factorization
    82. 82. Classification with Factoring Type Accuracy Random Chance 23.0% You Tube Features 14.6% You Tube Top 5 Categories 32.4% YT Top 5 Factoring Duration 51.8% Humans 60.9% YT Top 5 Factoring Views 66.9% YT Top 5 Factoring Ratings 75.5% YT Top 5 Factoring All Features 75.9% psst, yes we know that more training will do the same thing eventually, I just don’t like waiting.
    83. 83. Classification w/ Zync features Type Accuracy Random Chance 23.0% You Tube Features 14.6% You Tube Top 5 Categories 32.4% YT Top 5 Factoring Duration 51.8% Humans 60.9% YT Top 5 Factoring Views 66.9% YT Top 5 Factoring Ratings 75.5% YT Top 5 Factoring All Features 75.9% Zync Factored All Features 87.8% psst, we are looking at using Gradient Boosted Decision Trees in our future work.
    84. 84. Finding the viral.Can we predict if a video has over 10M views?More so, can we do so with say 10 people across 5 sessions?
    85. 85. Remember this is what we have for data
    86. 86. Viral Classification w/ Zync features Does the video have over 10 M views? Accuracy Guessing Yes 6.3% Guessing No 93.7% Guessing Randomly 88.3% Naive Bayes (25% training set) 89.2% Naive Bayes (50% training set) 95.5% Naive Bayes (80% training set) 96.6%
    87. 87. Three pieces ClassifierSurvey Data Interviews
    88. 88. Audience Perception Just ask Homer is Key
    89. 89. I !<3 Recommendation Systems
    90. 90. 3 areas prime for social recommendation for disrupt:
    91. 91. 1: Understanding the temporal and the recent.
    92. 92. Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
    93. 93. Social Conversations Happen Around MediaDolores Park, San Francisco, 2006
    94. 94. Come see my talk!
    95. 95. Lets find a moment Here’s an example.
    96. 96. All Tweets Inauguration TweetsLeft: All tweet sample.Right: Tweets with Inauguration keywords.
    97. 97. All Tweets Inauguration Tweets All Tweets with @Left: All tweet sample.Right: Tweets with Inauguration keywords.
    98. 98. 12:04 is what you want to watch.
    99. 99. 2: Q & A
    100. 100. Likes Generalization Questioningthe Question Clarification One Answer Finding answers... ...kinda like Watson.
    101. 101. 3: Challenges
    102. 102. Me: You’re in China, go to the night market for !!
    103. 103. Me: You’re in China, go to the night market for !!My friend: Street food? Are you kidding? I’ll get sick!
    104. 104. Me: You’re in China, go to the night market for !!My friend: Street food? Are you kidding? I’ll get sick!Me: I dare you not to!!
    105. 105. Me: You’re in China, go to the night market for !!You: Street food? Are you kidding? I’ll get sick!Me: I dare you not to! (It’s delicious!)
    106. 106. Man vs. Food http://www.travelchannel.com/TV_Shows/ Man_V_Food
    107. 107. Why try to understand engagement? Better advertising. Better understanding of the relationship between users and the sharing/ consumption of media content.Better organization and classification of media for efficient navigation and content retrieval. Better recommendations!
    108. 108. Find me: @ayman • aymans@acm.org Fin & Thanks!Thanks to D. DuBois, M. Slaney, E. Churchill, L. Kennedy, J.Yew, S. Pentland, A. Brooks, J. Dunning, B. Pardo, M. Cooper.Knowing Funny: Genre Perception and Categorization in Social Video Sharing Jude Yew; David A. Shamma; Elizabeth F. Churchill, CHI 2011, ACM, 2011Peaks and Persistence: Modeling the Shape of Microblog Conversations David A. Shamma; Lyndon Kennedy; Elizabeth F. Churchill, CSCW 2011, ACM, 2011In the Limelight Over Time: Temporalities of Network Centrality David A. Shamma; Lyndon Kennedy; Elizabeth F. Churchill, CSCW 2011, ACM, 2011Tweet the Debates: Understanding Community Annotation of Uncollected Sources David A. Shamma; Lyndon Kennedy; Elizabeth F. Churchill, ACM Multimedia, ACM, 2009Understanding the Creative Conversation: Modeling to Engagement David A. Shamma; Dan Perkel; Kurt Luther, Creativity and Cognition, ACM, 2009Spinning Online: A Case Study of Internet Broadcasting by DJs David A. Shamma; Elizabeth Churchill; Nikhil Bobb; Matt Fukuda, Communities & Technology, ACM, 2009Zync with Me: Synchronized Sharing of Video through Instant Messaging David A. Shamma; Yiming Liu; Pablo Cesar, David Geerts, Konstantinos Chorianopoulos, Social Interactive Television: Immersive Shared Experiences and Perspectives, Information Science Reference, IGI Global, 2009Enhancing online personal connections through the synchronized sharing of online video Shamma, D. A.; Bastéa-Forte, M.; Joubert, N.; Liu, Y., Human Factors in Computing Systems (CHI), ACM, 2008Supporting creative acts beyond dissemination David A. Shamma; Ryan Shaw, Creativity and Cognition, ACM, 2007Watch what I watch: using community activity to understand content David A. Shamma; Ryan Shaw; Peter Shafton; Yiming Liu, ACM Multimedia Workshop on Multimedia Information Retrival (MIR), ACM, 2007Zync: the design of synchronized video sharing Yiming Liu; David A. Shamma; Peter Shafton; Jeannie Yang, Designing for User eXperiences, ACM, 2007

    ×