The interplay of personal
preference and social influence
in sharing networks
Amit Sharma
Ph.D. candidate, Dept. of Computer Science
Cornell University
B-Exam
Whose favorite was Friday (and why)?
2
Sharing networks are popular
3
Sharing networks are popular
4
Sharing networks are popular
5
Sharing networks are popular
6
Two ways of looking at the world
7
RECOMMENDATION SYSTEMS
Model individual preferences of
users
Does not consider explicitly the
effect of sharing or social processes
[Ma et al. ‘09, Konstas et al. ‘08,
Jamali and Ester ‘10, Sharma and
Cosley ‘11, Sharma and Yan ‘13]
NETWORK DIFFUSION
Model spread of items in a social
network, one at a time
Does not consider an individual’s
preferences over items
[Watts ‘02, Kempe et al. ‘03, Bakshy
et al. ‘09, Lerman and Ghosh ‘10 ]
Towards understanding people’s adoption
and sharing decisions
8
Interplay between people's own preference, social influence
and system elements such as recommendations or feeds.
Preference
models
Influence
estimation
System
context
Example 1: How do friends-based
recommendations compare with those computed
from the full network?
9
[Sharma-Gemici-Cosley 2013]
MOVIES
MUSIC
HASHTAGS
# Friends ~ 100-500
# Non-friends ~ 50k
Example 2: Which social explanation would
influence you to try out a musical artist?
10
10 of your friends like this Dan and Levent like this
a = 0.61 a = 0.74a = 0.66
10 of your friends like this. Dan likes this.
[Sharma-Cosley 2013]
MUSIC
a = Rigidness
My contributions
EXPERIMENTAL
• The effect of personal
preference and influence
on people’s decisions
– Item adoption [WWW ‘13]
– Item sharing [CSCW ‘15]
11
OBSERVATIONAL
• Aggregate effect of
people’s activities over the
sharing network
– Extent of preference
locality [ICWSM ‘13]
– Influence estimation [In
submission to CSCW ‘16]
12
Understanding people’s adoption and sharing decisions
How do people’s
preferences influence their
sharing decisions?
How much do social feeds
influence people’s actions
on items?
Future Work
Outline
Directed sharing: Questions
Why did she share that item?
Does she like it? Will he like it?
Can we predict what items she will share to him?
13
Two motivations for sharing
Word-of-mouth
Individuation
– Establish a distinct identity
for oneself
Altruism
– Help others
[Dempsey et al. 2010]
Online Content sharing
Sender’s preferences
– Sender shares what she
likes
Recipient’s preferences
– Sender shares what recipient
would like
Comparing sender’s rating versus recipient’s rating for a
shared item can indicate the relative effect of these
motivations. 14
Research questions
• RQ1: To what extent do people tend to share items that
they like themselves (individuation) versus those that they
perceive to be relevant for the recipient (altruism)?
• RQ2: Can we predict whether an item is shared based on
sender’s and recipient’s preferences?
15
Person A’s
movie Likes
Compute recs.
Person B’s
movie Likes
Compute recs.
Combine recs.
A paired experiment on Facebook (N=118)
16
10
Recs.
for
me
10
Recs.
for
partner
To mitigate social
influence effects, my
partner is not shown
which movies were
shared by me.
17
Own-Algo Other-Algo
MOVIES
People share what they like themselves
Rating by Senders
Frequency
18
Senders rate shared items higher than
recipients
Mean sender rating: 4.19
Mean recipient rating: 3.88
(Paired t-test)
Sender Rating – Recipient Rating
Frequency
19
Responses support individuation
“Usually when I suggest, it depends on the item, not the
target individual, because I want to share what I enjoyed.”
(P8)
“I suggest because I like something and I want to see if other
people feel the same way about an item.” (P91)
Altruism:
“I make suggestions to people if I think they might gain
enjoyment. Obviously it really depends on their personality
and their likes/dislikes.” (P22)
20
Data from people who did not see all
recommendations
• Due to lack of Like data or API errors.
Recs.
for
me
Recs.
for
partner
Recs.
for
me
Recs.
for
partner
Both-Shown Other-ShownOwn-Shown
21
Ratings for shared items depend on item set
shown
Recs.
for
me
Recs.
for
partner
Other-ShownOwn-Shown
22
Recs.
for
me
Recs.
for
partner
Both-Shown
Sender’s µ= 4.4
Recipient’s µ = 3.7
(***)
Sender’s µ = 4.06
Recipient’s µ = 4.28
(ns)
Own-Algo: (***)
Other-Algo: (ns)
[Paired t-test]
Salience of items impacts what gets shared.
A preference-salience model
“I try to assess if the individual that I am recommending
to would like the movie that I am suggesting. Otherwise,
I do not tell them about the movie, and may think of
someone else who would like the movie.” (P5)
People’s own preferences determine
shareable items.
Among these candidates, some
become salient based on the context.
They are shared if sharer thinks they
are suitable for the recipient.
23
Other plausible models
High Quality Model
– No difference between overall IMDB ratings for shared and non-
shared movies.
Misguided Altruism Model
– Senders’ ratings are higher for shares than non-shares.
24
RQ2: Can we predict what is shared?
• Classification task: Given a sender, recipient and an item,
decide whether it was shared or not.
• Randomly sampled an equal number of non-shares.
• Use 10-fold cross validation and a decision tree classifier.
• Evaluation metrics: Precision and recall.
• Features:
– IMDB average rating, popularity for item
– Recipient’s predicted rating for item
– Sender’s predicted rating for item
25
Better precision with sender-based features
0
10
20
30
40
50
60
70
80
90
IMDB Rating Popularity Recipient-Item
Rating
Sender-Item
Rating
All
Precision Recall 26
Summary
• RQ1: Individuation (personal preferences) dominate the
decision process for directed content sharing.
• RQ2: Based on sender and recipient preferences, we can
(noisily) predict what is shared.
As with adoption (Example 2), people’s own
preferences dominate their sharing decisions.
27
28
Understanding people’s adoption and sharing decisions
People’s personal
preferences dominate their
sharing decisions
How much do social feeds
influence people’s actions
on items?
Future Work
Outline
How much and why do people copy feed
actions?
29
Virality is rare, vast majority of shares spread
to zero or one degree [Goel et al. ‘12].
Most studies on social media find a nontrivial
correlation between the activities of a user
and her friends [Sharma and Cosley ’13].
Q: Can we ascribe how many copy actions are
caused by influence from friends?
In general, hard to infer from observational
data alone [Shalizi and Thomas ‘11].
Many processes for generating a common
action by friends
• Social Influence
• Homophily
30
Without controlling for homophily, we may
overestimate influence [Aral et al. ’09, Lewis et al. ‘12].
A testable definition for influence
Influence: Deviation from the expected activity based
on following one’s personal preference for items.
• Based on a data-driven model of personal preferences.
– Past actions represent a user’s preferences.
• Based on an explicit model of the system elements that
exposes people to others’ activities.
– Assume reverse chronological feed. A user scans it from top to
bottom.
31
Controlling for homophily using preference
similarity
Use observed activity to create a proxy for homophily.
32
Non-FriendsFriends
f5
u
f1
f4
f3f2
n5
u
n1
n4
n3n2
0.4 0.4
0.70.3
0.60.5
0.7 0.3
0.60.5
Estimating the actions due to influence
For each action by a user, construct feeds from friends and
non-friends containing their last M actions respectively.
Friends Overlap = Fraction of actions done by u that are also
in the friends’ Feed
(Naïve measure of influence [Ghosh et al. ‘10, Bakshy et
al. ‘11]).
NonFriends Overlap = Fraction of actions done by u that are
also in the non-friends’ Feed.
33
The full procedure
MATCHING STEP (before time T)
For each user:
Construct a set of non-friends that are as similar to the
user as her friends.
ESTIMATION STEP (after time T)
For each user:
Influenceu = FriendsOverlap – NonFriendsOverlap
34
The Last.fm dataset
35
LISTEN SONG LOVE SONG
# Ego Networks 96K
# Total Users 312K
# Total Songs 23M
# Total Actions 656M
# Ego Networks 141K
# Total Users 437K
# Total Songs 13M
# Total Actions 140M
Size of Feed(M) = 10
Time T is chosen such that 90% of actions are before T.
Random seeds, Weighted breadth-first crawl for 3 months
Validation using semi-synthetic Loves data
36
Personal preference: Choose a song randomly from the last M loves by
the k-most similar users (k=10).
Influence process: Choose a song randomly from the last M loves by her
friends.
Process FriendsOverlap Influence Std. Error
Personal Preference(PP) 0.042 0.001 0.0001
Influence(I) 1.00 0.99 0.0004
I-PP(10%-90%) 0.15 0.102 0.0001
Generate synthetic loves on songs after time T from any of the processes,
keeping the timestamps and the social network same as before.
FriendsOverlap overestimates influence by at
least 300% across listen and love actions.
37
Is this specific to Last.fm?
38
Assumptions of Influence Estimation:
Reverse chronological feed
Preferences as a proxy for homophily
Can be applied to any sharing platform that shows friends’ activities in a
(loosely) reverse chronological order.
RATE BOOKS
FAVORITE
PHOTOS
RATE MOVIES
# Ego Networks 252K
# Total Users 252K
# Total Items 1.3M
# Total Actions 28M
# Ego Networks 49K
# Total Users 50K
# Total Items 48K
# Total Actions 7.9M
# Ego Networks 175K
# Total Users 183K
# Total Items 11M
# Total Actions 33M
[Huang et al. ‘12] [Jamali and Ester ‘10] [Cha et al. ‘09]
FriendsOverlap overestimates influence in all
three domains
39
Overestimate by 14% in Flickr, more than 500% in Flixster.
Influence is overrated(?)
40
Not more than 1% of user actions on online sharing networks
can be attributed to influence.
41
Understanding people’s adoption and sharing decisions
People’s personal
preferences dominate their
sharing decisions
Less than 1% of people’s
actions due to social
influence.
Future Work
Outline
Claim: Modeling both preference and influence
leads to better understanding of diffusion
42
Improves estimates of the effect of social influence
Suggests personalization strategies for social recommenders
Points to preference-aware models of diffusion
0
20
40
60
80
100
IMDB Rating Recipient-Item
Rating
Sender-Item Rating
Precision Recall
Future Work
43
RECOMMENDATION SYSTEMS
Reason about properties such as
influence, diversity, utility and
privacy.
Build recommendation
algorithms that account for
these properties.
NETWORK DIFFUSION
Develop diffusion models that
incorporate people’s preference
and social influence.
Evaluate by accuracy in
predicting adoptions or shares.
Thank you
44
RECOMMENDATION SYSTEMS
Reason about properties such as
influence, diversity, utility and
privacy.
Build recommender algorithms
that account for these properties.
NETWORK DIFFUSION
Develop diffusion models that
incorporate people’s preference
and social influence.
Evaluate by accuracy in
predicting adoptions or shares.
Collaborators: Dan Cosley, Mevlana Gemici, Michael Triche, Yulan Miao,
Meethu Malu
Directed sharing: More altruism?
• Meformers versus informers: ~80% of content shared on
Twitter was about the user [Naaman et al. 2008]
• In directed sharing, there is a known recipient
– Expect altruism to be more important
45
Design implications
Recommender systems for effective sharing
• Recommending what to share, who to share it to.
E.g., Feedme system [Bernstein et al. 2010]
Diffusion models with directed sharing
• Accounting for sender and recipient preferences
46
Decision Tree for sharing
47
Decision Tree for sharing
48
Influence per user
shows a starker
contrast
For loves on songs more than
50% of users have zero or
lower influence estimate.
For loves on artists, influence
increases but accounting for
preference similarity still
cancels out most of
FriendsOverlap.
49
A simple decision model for ratings
User's receptiveness to
an explanation.
[Effect of Explanation]
User's discernment in music.
[Base Decision Process]
Coldplay
+
Amit Sharma likes
this.
50
A simple decision model for ratings
Base Decision Process
f(x) = D e-Dx D: Discernment
Effect of Explanations
Mixture Model
h(x) = a f(x) + (1-a) g(x) a: Rigidness
µ: Receptivity
51
Model explains variance in people’s ratings
52
a = 0.61
a = 0.74a = 0.71
a = 0.66a = 0.66
People are differently susceptible to
explanation
53
Recommender Systems: Opportunities for personalized
explanation strategies.
User Cluster 1 User Cluster 2 User Cluster 3
User rating User rating User rating
Probability
Future work
• Temporal evolution of preference
• All friends aren’t equal
• Exploring new domains, towards general models of
behavior
• Feedback between experimental and observational studies
54
Implementing the test on real network data
55
• .
Core user: Any user for whom at least 75% of friends have preferences
in our dataset.
Datasets from Facebook and Twitter
Activity data: Movie and music Likes on Facebook, hashtag
usage on Twitter
56
Datasets
57

The interplay of personal preference and social influence in sharing networks [Ph.D. defense talk]

  • 1.
    The interplay ofpersonal preference and social influence in sharing networks Amit Sharma Ph.D. candidate, Dept. of Computer Science Cornell University B-Exam
  • 2.
    Whose favorite wasFriday (and why)? 2
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Two ways oflooking at the world 7 RECOMMENDATION SYSTEMS Model individual preferences of users Does not consider explicitly the effect of sharing or social processes [Ma et al. ‘09, Konstas et al. ‘08, Jamali and Ester ‘10, Sharma and Cosley ‘11, Sharma and Yan ‘13] NETWORK DIFFUSION Model spread of items in a social network, one at a time Does not consider an individual’s preferences over items [Watts ‘02, Kempe et al. ‘03, Bakshy et al. ‘09, Lerman and Ghosh ‘10 ]
  • 8.
    Towards understanding people’sadoption and sharing decisions 8 Interplay between people's own preference, social influence and system elements such as recommendations or feeds. Preference models Influence estimation System context
  • 9.
    Example 1: Howdo friends-based recommendations compare with those computed from the full network? 9 [Sharma-Gemici-Cosley 2013] MOVIES MUSIC HASHTAGS # Friends ~ 100-500 # Non-friends ~ 50k
  • 10.
    Example 2: Whichsocial explanation would influence you to try out a musical artist? 10 10 of your friends like this Dan and Levent like this a = 0.61 a = 0.74a = 0.66 10 of your friends like this. Dan likes this. [Sharma-Cosley 2013] MUSIC a = Rigidness
  • 11.
    My contributions EXPERIMENTAL • Theeffect of personal preference and influence on people’s decisions – Item adoption [WWW ‘13] – Item sharing [CSCW ‘15] 11 OBSERVATIONAL • Aggregate effect of people’s activities over the sharing network – Extent of preference locality [ICWSM ‘13] – Influence estimation [In submission to CSCW ‘16]
  • 12.
    12 Understanding people’s adoptionand sharing decisions How do people’s preferences influence their sharing decisions? How much do social feeds influence people’s actions on items? Future Work Outline
  • 13.
    Directed sharing: Questions Whydid she share that item? Does she like it? Will he like it? Can we predict what items she will share to him? 13
  • 14.
    Two motivations forsharing Word-of-mouth Individuation – Establish a distinct identity for oneself Altruism – Help others [Dempsey et al. 2010] Online Content sharing Sender’s preferences – Sender shares what she likes Recipient’s preferences – Sender shares what recipient would like Comparing sender’s rating versus recipient’s rating for a shared item can indicate the relative effect of these motivations. 14
  • 15.
    Research questions • RQ1:To what extent do people tend to share items that they like themselves (individuation) versus those that they perceive to be relevant for the recipient (altruism)? • RQ2: Can we predict whether an item is shared based on sender’s and recipient’s preferences? 15
  • 16.
    Person A’s movie Likes Computerecs. Person B’s movie Likes Compute recs. Combine recs. A paired experiment on Facebook (N=118) 16
  • 17.
    10 Recs. for me 10 Recs. for partner To mitigate social influenceeffects, my partner is not shown which movies were shared by me. 17 Own-Algo Other-Algo MOVIES
  • 18.
    People share whatthey like themselves Rating by Senders Frequency 18
  • 19.
    Senders rate shareditems higher than recipients Mean sender rating: 4.19 Mean recipient rating: 3.88 (Paired t-test) Sender Rating – Recipient Rating Frequency 19
  • 20.
    Responses support individuation “Usuallywhen I suggest, it depends on the item, not the target individual, because I want to share what I enjoyed.” (P8) “I suggest because I like something and I want to see if other people feel the same way about an item.” (P91) Altruism: “I make suggestions to people if I think they might gain enjoyment. Obviously it really depends on their personality and their likes/dislikes.” (P22) 20
  • 21.
    Data from peoplewho did not see all recommendations • Due to lack of Like data or API errors. Recs. for me Recs. for partner Recs. for me Recs. for partner Both-Shown Other-ShownOwn-Shown 21
  • 22.
    Ratings for shareditems depend on item set shown Recs. for me Recs. for partner Other-ShownOwn-Shown 22 Recs. for me Recs. for partner Both-Shown Sender’s µ= 4.4 Recipient’s µ = 3.7 (***) Sender’s µ = 4.06 Recipient’s µ = 4.28 (ns) Own-Algo: (***) Other-Algo: (ns) [Paired t-test] Salience of items impacts what gets shared.
  • 23.
    A preference-salience model “Itry to assess if the individual that I am recommending to would like the movie that I am suggesting. Otherwise, I do not tell them about the movie, and may think of someone else who would like the movie.” (P5) People’s own preferences determine shareable items. Among these candidates, some become salient based on the context. They are shared if sharer thinks they are suitable for the recipient. 23
  • 24.
    Other plausible models HighQuality Model – No difference between overall IMDB ratings for shared and non- shared movies. Misguided Altruism Model – Senders’ ratings are higher for shares than non-shares. 24
  • 25.
    RQ2: Can wepredict what is shared? • Classification task: Given a sender, recipient and an item, decide whether it was shared or not. • Randomly sampled an equal number of non-shares. • Use 10-fold cross validation and a decision tree classifier. • Evaluation metrics: Precision and recall. • Features: – IMDB average rating, popularity for item – Recipient’s predicted rating for item – Sender’s predicted rating for item 25
  • 26.
    Better precision withsender-based features 0 10 20 30 40 50 60 70 80 90 IMDB Rating Popularity Recipient-Item Rating Sender-Item Rating All Precision Recall 26
  • 27.
    Summary • RQ1: Individuation(personal preferences) dominate the decision process for directed content sharing. • RQ2: Based on sender and recipient preferences, we can (noisily) predict what is shared. As with adoption (Example 2), people’s own preferences dominate their sharing decisions. 27
  • 28.
    28 Understanding people’s adoptionand sharing decisions People’s personal preferences dominate their sharing decisions How much do social feeds influence people’s actions on items? Future Work Outline
  • 29.
    How much andwhy do people copy feed actions? 29 Virality is rare, vast majority of shares spread to zero or one degree [Goel et al. ‘12]. Most studies on social media find a nontrivial correlation between the activities of a user and her friends [Sharma and Cosley ’13]. Q: Can we ascribe how many copy actions are caused by influence from friends? In general, hard to infer from observational data alone [Shalizi and Thomas ‘11].
  • 30.
    Many processes forgenerating a common action by friends • Social Influence • Homophily 30 Without controlling for homophily, we may overestimate influence [Aral et al. ’09, Lewis et al. ‘12].
  • 31.
    A testable definitionfor influence Influence: Deviation from the expected activity based on following one’s personal preference for items. • Based on a data-driven model of personal preferences. – Past actions represent a user’s preferences. • Based on an explicit model of the system elements that exposes people to others’ activities. – Assume reverse chronological feed. A user scans it from top to bottom. 31
  • 32.
    Controlling for homophilyusing preference similarity Use observed activity to create a proxy for homophily. 32 Non-FriendsFriends f5 u f1 f4 f3f2 n5 u n1 n4 n3n2 0.4 0.4 0.70.3 0.60.5 0.7 0.3 0.60.5
  • 33.
    Estimating the actionsdue to influence For each action by a user, construct feeds from friends and non-friends containing their last M actions respectively. Friends Overlap = Fraction of actions done by u that are also in the friends’ Feed (Naïve measure of influence [Ghosh et al. ‘10, Bakshy et al. ‘11]). NonFriends Overlap = Fraction of actions done by u that are also in the non-friends’ Feed. 33
  • 34.
    The full procedure MATCHINGSTEP (before time T) For each user: Construct a set of non-friends that are as similar to the user as her friends. ESTIMATION STEP (after time T) For each user: Influenceu = FriendsOverlap – NonFriendsOverlap 34
  • 35.
    The Last.fm dataset 35 LISTENSONG LOVE SONG # Ego Networks 96K # Total Users 312K # Total Songs 23M # Total Actions 656M # Ego Networks 141K # Total Users 437K # Total Songs 13M # Total Actions 140M Size of Feed(M) = 10 Time T is chosen such that 90% of actions are before T. Random seeds, Weighted breadth-first crawl for 3 months
  • 36.
    Validation using semi-syntheticLoves data 36 Personal preference: Choose a song randomly from the last M loves by the k-most similar users (k=10). Influence process: Choose a song randomly from the last M loves by her friends. Process FriendsOverlap Influence Std. Error Personal Preference(PP) 0.042 0.001 0.0001 Influence(I) 1.00 0.99 0.0004 I-PP(10%-90%) 0.15 0.102 0.0001 Generate synthetic loves on songs after time T from any of the processes, keeping the timestamps and the social network same as before.
  • 37.
    FriendsOverlap overestimates influenceby at least 300% across listen and love actions. 37
  • 38.
    Is this specificto Last.fm? 38 Assumptions of Influence Estimation: Reverse chronological feed Preferences as a proxy for homophily Can be applied to any sharing platform that shows friends’ activities in a (loosely) reverse chronological order. RATE BOOKS FAVORITE PHOTOS RATE MOVIES # Ego Networks 252K # Total Users 252K # Total Items 1.3M # Total Actions 28M # Ego Networks 49K # Total Users 50K # Total Items 48K # Total Actions 7.9M # Ego Networks 175K # Total Users 183K # Total Items 11M # Total Actions 33M [Huang et al. ‘12] [Jamali and Ester ‘10] [Cha et al. ‘09]
  • 39.
    FriendsOverlap overestimates influencein all three domains 39 Overestimate by 14% in Flickr, more than 500% in Flixster.
  • 40.
    Influence is overrated(?) 40 Notmore than 1% of user actions on online sharing networks can be attributed to influence.
  • 41.
    41 Understanding people’s adoptionand sharing decisions People’s personal preferences dominate their sharing decisions Less than 1% of people’s actions due to social influence. Future Work Outline
  • 42.
    Claim: Modeling bothpreference and influence leads to better understanding of diffusion 42 Improves estimates of the effect of social influence Suggests personalization strategies for social recommenders Points to preference-aware models of diffusion 0 20 40 60 80 100 IMDB Rating Recipient-Item Rating Sender-Item Rating Precision Recall
  • 43.
    Future Work 43 RECOMMENDATION SYSTEMS Reasonabout properties such as influence, diversity, utility and privacy. Build recommendation algorithms that account for these properties. NETWORK DIFFUSION Develop diffusion models that incorporate people’s preference and social influence. Evaluate by accuracy in predicting adoptions or shares.
  • 44.
    Thank you 44 RECOMMENDATION SYSTEMS Reasonabout properties such as influence, diversity, utility and privacy. Build recommender algorithms that account for these properties. NETWORK DIFFUSION Develop diffusion models that incorporate people’s preference and social influence. Evaluate by accuracy in predicting adoptions or shares. Collaborators: Dan Cosley, Mevlana Gemici, Michael Triche, Yulan Miao, Meethu Malu
  • 45.
    Directed sharing: Morealtruism? • Meformers versus informers: ~80% of content shared on Twitter was about the user [Naaman et al. 2008] • In directed sharing, there is a known recipient – Expect altruism to be more important 45
  • 46.
    Design implications Recommender systemsfor effective sharing • Recommending what to share, who to share it to. E.g., Feedme system [Bernstein et al. 2010] Diffusion models with directed sharing • Accounting for sender and recipient preferences 46
  • 47.
  • 48.
  • 49.
    Influence per user showsa starker contrast For loves on songs more than 50% of users have zero or lower influence estimate. For loves on artists, influence increases but accounting for preference similarity still cancels out most of FriendsOverlap. 49
  • 50.
    A simple decisionmodel for ratings User's receptiveness to an explanation. [Effect of Explanation] User's discernment in music. [Base Decision Process] Coldplay + Amit Sharma likes this. 50
  • 51.
    A simple decisionmodel for ratings Base Decision Process f(x) = D e-Dx D: Discernment Effect of Explanations Mixture Model h(x) = a f(x) + (1-a) g(x) a: Rigidness µ: Receptivity 51
  • 52.
    Model explains variancein people’s ratings 52 a = 0.61 a = 0.74a = 0.71 a = 0.66a = 0.66
  • 53.
    People are differentlysusceptible to explanation 53 Recommender Systems: Opportunities for personalized explanation strategies. User Cluster 1 User Cluster 2 User Cluster 3 User rating User rating User rating Probability
  • 54.
    Future work • Temporalevolution of preference • All friends aren’t equal • Exploring new domains, towards general models of behavior • Feedback between experimental and observational studies 54
  • 55.
    Implementing the teston real network data 55 • . Core user: Any user for whom at least 75% of friends have preferences in our dataset.
  • 56.
    Datasets from Facebookand Twitter Activity data: Movie and music Likes on Facebook, hashtag usage on Twitter 56
  • 57.

Editor's Notes

  • #2 Goodmorning! Thank you all for coming. I am Amit Sharma, a Ph.D. student in computer science. It has been a wonderful journey at Cornell and today I am going to talk about my research work. On understanding the role of personal preference and social influence in sharing networks.
  • #3 Some time in 2011, my Facebook feed got flooded with my friends listening to this song called Friday by Rebecca Black. Out of curiosity, I clicked the link and here is how it sounds: terrible song. Did my friends really like this song? I wanted to know so that i could unfriend that person. Then it occurred to me--that it could be most of my friends watched this video the same way as me. By seeing a lot of their friends doing so. Really, the news feed acted as a vehicle of social influence, for me and many others. This example illustrates the interplay between influence and users' own preference when items are adopted. In this case, not so much about people's preferences in music. :) Notes: It had 80% of dislikes on YouTube, yet was driving a lot of visits.
  • #4 That was just one example. Not just in music, our activities in diverse domains such as movies, photos, ecommerce and even programming are being mediated by sharing networks. I define a sharing network loosely, as any network where people share items with self-curated list of social connections. On Last.fm, for example, I can view which songs my friends loved, Goodreads shows me the books my friends rated, Facebook recommends movies based on my friends, and on GitHub, I can see my friends activity on repositories. The question that comes up is how much are we influenced by seeing our friends’ activities, versus following our own preference? ---OMIT--- and within most of them, there are feeds that present friends’ activities in loosely reverse chronological order. Talk about whether they match our preferences? Or are we just shown something we don’t like Friday? Common Characteristic: Feeds/social recommenders that show users their friends’ activities Notes: An increasingly social world…Our lives are increasingly mediated by online social networks When I started out my Ph.D., social was taking off. Now it is almost everywhere, ubiquitous. Make it clear that I’m talking about items, not updates from friends.
  • #8 There are two major ways of looking at such questions around people’s preferences and social influence. The first is the recommender systems perspective. I started out in the recommendation systems community that aims to model individual preferences in a social network. Early on in my Ph.D., I became interested in idea of using social networks to generate recommendations and wrote some papers/and published some papers. But I gradually realized that there is much more going on in a social network—first, people can share items to each other, and second, the underlying social processes of influence also guide what is shared or what is adopted. It became clear to me that to build any effective recommendation system designed for social networks, the effect of these forces is important to model. The network diffusion research provides a comprehensive study on how items spread in a social network. But in most work, they consider items one at a time, which is a drawback, because items are not adopted or shared in a vacuum. They are likely based on each person’s preferences for items, which can be learnt from her past activity. Combining these two world-views can lead to a better understanding people’s adoption and sharing decisions. ---END------- Research Agenda: Infusing preference modeling from recommendations and explicit modeling of social influences into the study of adopting and sharing #My research aims to take the best of both worlds for # and unify these two world-views, by infusing preference modeling and an explicit understanding of social influences in the study of adopting and sharing items.
  • #9 Thus, in my research, I focus on the estimating the effects of influence in sharing networks, and I use preference models from recommender systems to define and operationalize influence, as I will show later. The system context is also important to model: in practical recommendations and feed interfaces, and the algorithms behind them, that determine what people see in sharing networks. Overall goal of my research is to make sense of this interplay between people's own preference, social influence and the recommendations that are shown to them. --OMIT--- For example. how does the network's activity affect recommendations? How do the recommendations affect the network itself?The first one concerns how to design/build recommendations in sharing networks and the second concerns understanding the influence of such recommendations. Notes: [Recommendations, diffusion models may improve, improve social experience for us)
  • #10 Here are two examples of how we might understand this interplay. I talked about both of these in my A-exam, so I will go over them briefly here. The first example asks the simple question: how similar are friends’ activities to the user? One way to do that is to compare recommendations that are based only on friends’s data, to those that are computed using the full network. I used datasets from Facebook for movies and music, and from Twitter for hashtags, to build a recommender based only on friends. This plot shows the NDCG, higher is better, for the accuracy of recommendations. Even though friends are much fewer, they perform comparably to non-friends in recommendation. This shows that there is some locality in preferences in sharing networks.
  • #11 The next question then is: how does this locality in preferences emerge. It could be because knowing about friends’ activities influence people. So I studied the effect of exposure to friends’ activities on people’s willingness to try out an item, similar to how many sharing networks present items to users. For example, say you are shown a new musical artist, one you’ve never heard of, accompanied by one of one of these two social explanations. One explanation tells you that 10 of your friends like this, and another that Dan likes it. Which one would you choose? What I found was that named friend-based explanations are more powerful than count of friends: this plot shows people’s rating on 0-10 scale and the bumps signify the effect of influence. Through a generative model of people’s ratings, the main result was that social explanations only have a secondary effect. A major part of the ratings were still guided by people’s personal preference ( a decreasing distribution for unknown items), while explanations could lead to a small bump. ----OMIT-- Answering questions like these requires a new kind of understanding of recommender systems and the social influence processes they embody, how they operate through such social feeds. --- As another example, consider we observe some similarity in music activity between two individuals. Is it because of actual influence between the two, or just a matter of homophily in that people select similar people to be their friends? Answers to these questions will greatly impact the kind of recommender systems that will be effective for these users. E.g. whether we emphasize a person;s own preference versus activity in their neighborhood, or how we show other people’s activity. They would also go a long way in deepening our understanding about diffusion on these networks.
  • #12 Today, I will present some of my recent work on adoption and sharing, covering both experimental and large-scale observational studies. Both are important for understanding sharing networks. With experiments, we want to know people's motivations and considerations for sharing and adopting items and how they interact with social influence, and I will show that for sharing. In the observational studies, we are interested in the aggregate effect of these activities over the network and I will show a novel procedure for estimating influence. ---- [To change]This requires both micro and macro-social investigations. At the micro level, I am interested in people's motivations and considerations for sharing and accepting items and how they interact with social influence, at the macro level, we are interested in the aggregate effects of these activities over the network. To gain a complete understanding, feedback between the two scales of inquiry is essential.
  • #13 So here’s the gameplan. I hope I’ve convinced you the importance of studying adoption and sharing decisions. I will first present a study on the role of preferences in sharing decisions and then move on to estimating influence. After studying the role of personal preference and influence in adoption, I conducted a study on sharing where we want to see how much do personal preferences play a role in people’s sharing decisions. I will talk about that first, and then talk about a novel procedure to estimate the effect of influence in adoption data on multiple websites.
  • #14 Let us consider a scenario where blue user shares a movie, Avatar with her friend. We call this directed sharing because there is a known recipient. The following questions come up about the sharing process: Why did she share that items? Did she like the item? Will he like it? And further, if we think in terms of prediction, can we model what items are shared ?
  • #15 From past work on word-of-mouth sharing, we know that there are two major motivations to share: Individuation and Altruism. Individuation refers to the need to establish a distinct identity for oneself, and altruism refers to the motivation to help others. In the online world, we can map these motivations to observable data on preferences of users. For example, individuation can be mapped to sender’s preferences: intuitively, a sender expresses her identity by sharing what she likes. Similarly, altruism can be mapped to recipient’s preferences: the sender tends to share what the recipient would like. Thus, here is the key assumption: comparing a sender’s rating to a recipient’s rating for a shared item can indicate the relative effect of these two, competing motivations. I would like to add here that sharing is a complex process and a lot more factors such tie strength, norms may impact a sharing decision, but we look at it from the lens of people’s preferences.
  • #16 Specifically, here are the two research questions we want to tackle. To what extent do people share items that they like themselves versus those that they perceive to be relevant for the recipient? We assume preferences are proxies for individuation and altruism. And second, Can we predicted whether an item is shared based on sender’s and recipient’s past preferences.
  • #17 To answer these questions, we conducted a paired experiment using Facebook as the underlying social network. We created a Facebook app where a participant chooses one of her Facebook friends as her partner for the study. The system fetches movie Likes for each participant and computes movie recommendations for them based on the k-nearest neighbors algorithm for collaborative filtering. We then combine these recommendations and randomly order them, and present this combined set to each of the participants. We do so that both participants look at the same set of items: that on expectation, would have items from the sender as well as the recipient’s preferences.
  • #18 Here is how the interface looked: Suppose I am one of the participants. I would see a total of 20 recommendations, 10 from each algorithm. I can rate and share these items to my partner—participants could rate or share as many as they wished. The rating is on 5 point scale. When I click on recommend, partner’s name is shown by default. Note a user does not know which ones were shared by their partner. Also, We designed the study so that partners did not need to complete the study at the same time.
  • #19 Our first observation is that senders rating for shared items is higher than for non-shared items. This shows a histogram of sender ratings for shared and non-shared items. Note how for ratings above 4, number of shared items are higher than non-shared items. For ratings below 4, it is more likely that the item was not shared.
  • #20 Our second observation is that when we consider shared items only, sender rating is significantly higher than recipient rating. Here, we show a plot for the difference between sender and recipient rating for shared items. You can see that there is more weight towards the right of zero for this distribution. These results suggest that individuation, and thus people’s own preference, plays a dominant role even when people are sharing directly to known recipients.
  • #21 Participants’ responses to a questionnaire about their sharing practices also supported individuation motivation. For instance, people wanted to share what they enjoyed, and wanted to see if others feel the same way about an item. But people also claimed that they were personalizing for the recipient, even though the rating data clearly shows that they were sharing what they like.
  • #22 To explain these contrasting results, we turn to some accidental but incredibly useful data. It turns out that many people did not see all 20 recommendations: this was due to lack of Like data or API errors. In effect, we had three groups of participants: those who were shown a mix of both types of recommendations, as we discussed, those who were only shown recommendations for themselves, and finally, those who were shown recommendations computed from their partner’s preferences. This allows to compare the recipient’s rating for shared items based on what senders saw. (Key phrase: Recommendations based on your preferences, recommendations based on your partner’s preferences)
  • #23 When senders see only recommendations for recipient (own before other) When senders see only recommendations for themselves Important: 2. The sender and recipient rating for shared items is comparable. The sender’s rating is still high, but they become more effective sharers when recommendations are tuned for the recipients’ preferences. To explain these contrasting results, we turn to some accidental but incredibly useful data. It turns out that many people did not see all 20 recommendations: this was due to lack of Like data or API errors. In effect, we had three groups of participants: those who were shown a mix of both types of recommendations, which we discussed, those who were only shown recommendations for themselves, and finally, those who were shown recommendations computed from their partner’s preferences. (Key phrase: Recommendations based on your preferences, recommendations based on your partner’s preferences) Own-Shown: Ratings for shared items by senders are significantly higher than those by recipients. Other-Shown: Ratings for shared items by senders are still high, but recipients’ ratings are comparable. Both-Shown: Same effect when dividing items shown to Both-shown participants by the underlying algorithm
  • #24 And this leads us to propose a preference-salience model of sharing, as exemplified by this quote, where a person shares what she likes, but customizes it for the recipient. That is,
  • #25 We also considered other plausible models but our data did not support them. People do try to customize shares to recipients but fail because of imperfect knowledge
  • #26 We now move to our second research question. Percentage of items returned by model that were actually shared Percentage of actually shared items that were returned by model
  • #27 IMP:This indicates the recipient rating is a better predictor for shares, but there are many shared items that do not have a high recipient rating. When predicting shares based on sender’s preferences, we get a high precision and recall, suggesting that sender’s own preferences drive a sharing decision. This is promising, indicating that we can predict more than three-fourths of the shares. Thus, we can noisily predict whether an item gets shared based on sender and receiver preferences.
  • #28 Alright, just as for adoption earlier in example 2, for sharing too, we find that people’s personal preferences dominate their decisions.
  • #29 Alright, to see how our experimental results for the effect of personal preference and influence transfer to other networks and item domains, we now turn to large-scale user activity on a broad range of websites. We focus on activity feeds, which are a common feature of sharing networks and arguably, a big way through which people are exposed to their friends’ activities. --OMIT--- We explore this next, by trying to find how many actions can be attributed to influence due to social feeds in sharing networks. To shed light on this question, we’ll look at some large-scale user activity data from a few popular sharing networks. Notes: Explicitly named friends (influence) more impactful than count of friends (conformity). The effect of social explanation varies with different strategies and different people. Can be used for personalized explanation strategies. Social explanations have a secondary effect. Still, aggregate effects can be modelled. A generative model gives us a window into people’s decision process.
  • #30 Here’s an example from Last.fm of a typical social feed that is shown in sharing networks, showing friends activities in a reverse chronological order. From past work on networks like Twitter, we know that a vast majority of shares spread only to zero or one degree beyond the initial sharer. Still, most studies on social media find a nontrivial correlation between the activities of a user and her friends, as we saw earlier. The question then is: can we ascertain how much activity is actually caused due to influence from seeing others’ activities? In general, this is hard to infer, but we’ll see how far we can get by using people’s preferences and the specific nature of feeds in sharing networks.
  • #31 Now, you may ask: why is identifying influence hard? Because there could be multiple reasons if we discover a common action between a user and her friend. Suppose you are looking at a stream of activity from last.fm and you find two friends who loved the same song on Last.fm in quick succession. Each of these processes can lead to a copy action. Now, it could be one of them saw the other’s activity and being receptive to social influence, copied the friend’s action. However it could also be due to homophily process: it just so happens that the two friends have similar preferences and both of them were simply following their own preference. Thus, without controlling for homophily, we may overestimate the actual extent of influence. --- Finally, two users might get exposed to the same song simultaneously. E.g. through mass media, such as television E.g. through concerts or local shows --- Other confounding factors: external exposure to an ad or sharing common context (e.g. going to a concert)
  • #32 To make progress, I propose the following testable definition of influence based on personal preference. Instead of using general definitions that are hard to operationalize, I define influence as the deviation from expected activity from following one’s personal preference for items and ground it specifically in the context of feeds in sharing networks. This has two advantages: one, it is based on a data-driven model of personal preference: past actions can be used to represent a user’s preferences. And second, it uses an explicit model of exposure to others’ activities, allowing us to verify and extend the measures of influence for different sharing networks. For this work, I assume a reverse chronological feed and assume that users scan it from top to bottom.
  • #33 Based on this definition, I now propose a procedure for estimating influence. I use the observation that by design, users of sharing networks can only see their friend’s activities in their feed. Thus, if there was a control set of users who were as similar to a user as her friends and whose activities the user did not see, that would give some baseline into the common actions due to homophily. For every friend of a user, we select a non-friend such that its similarity with the user is the same as that of the friend. Thus, in this way, we construct a set of non-friends that are as similar to the user as the original friends. Omit: Now, past work has used user attributes data to find such control users. However, it is often not possible to get detailed data on each user and even if we do, one needs to be convinced that the set of attributes are sufficient to explain the similarity in actions between users. Instead, we directly use past activity data of users to create a preference profile for each user and use that to control for homophily. #then compute the similarity in those preferences. --- Thus, comparing the common actions between a user and his friend, and the user and non-friend, should give an estimate of the actual influence. ---OMIT-- Since people's preferences derive partly from demographics [17, 26], we are, indirectly, observing their demographic at- tributes, as well as other hidden attributes that drove these preferences.
  • #34 Now we can use these two sets---friends and non-friends---to estimate the actions due to influence. For each action by a user, construct two feeds, one is the actual feed, containing the last M actions from friends and the other, a synthetic feed containing the last M actions of non-friends. M is the size of the feed. Now, we consider the number of actions by a user that were present in the friends feed and the nonfriends feed. If we were to compute the fraction of actions that are also in the corresponding friends feed, we get a naïve measure of influence. If however, we subtract the actions that were also present in the nonfriends feed, and thus possibly could have happened without the feed, then we get a more accurate estimate of influence.
  • #35 More concretely, our procedure has two steps: A matching step that constructs the non-friends set using activity data before some time T. And then an estimation step, that computes the overlap over friends and non-friends after time T. Finally, an aggregate measure of influence can be obtained as the mean of influence over each user.
  • #36 Now to study the effect of influence, I collected a dataset using Last.fm API. Last.fm is a unique website as it allows us data for both implicit and explicit preference data: listening and loving a song..where listening is a more implicit activity. I used a random set of users to set up the crawl and then used weighted BFS to crawl the network. Thus, we have two datasets, both having around hundred thousand ego networks and more than 10 million songs. For the results I will show you, I consider the size of the feed to be 10, and time T is selected such that 90% of the actions are before time T and the results are robust to changes in these parameters. ----OMIT--- All Loves collected for users since Feb 2014. Listens only for 3 months since Feb 2014
  • #37 Let’s first do some validation of the influence procedure. The idea is to artificially generate people’s actions based on influence or homophily, and then evaluating the estimates of influence that we get. Thus, we generate synthetic loves on songs after time T from any of the processes, keeping the timestamps and the social network same as before. First is the personal preference process, where a user chooses a song randomly from the last M loves by the k-most similar users. Similarly, a user could choose choose a song randomly from the last M loves by her friends. In all cases, we find that The influence measure gives a more accurate estimate of real influence in the network than FriendsOverlap.
  • #38 And when we look at the real data, for both listen and love action, we find that FriendsOverlap overestimates the number of actions due to influence by at least 300%. Another striking point to note is that influence is really low. The estimate can be interpreted as the average fraction of actions that are due to influence, and the graphs shows that not more than 0.5% of actions can be attributed to influence. Notes: have a backup slide for variation with M. And we tried other values of T and M, these results are robust to the choice of M and T.
  • #39 This suggests that influence from feeds plays a really minor role in how people act on songs in Last.fm. A natural question to ask is how general are these results: do they hold on other sharing networks too? Now our influence procedure based on two key assumptions: one is the presence of reverse chronological feed, and the other that we consider preferences as a proxy for homophily. Thus, we could apply these results to any sharing platform that shows friends’ activities in reverse chronological order and provides preference data. So I applied the procedure to datasets on a broad range of online preference data: Book rating data on Goodreads, movie ratings on Flixster and favoriting photos on Flickr. While they all have a social network and a feed, the properties of the network differ widely: e.g. fLixster is a dense dataset on items: there are about 3 favorite actions per photo on average in Flickr, while there are about 163 ratings per movie on Flixster. Variations: Number of actions per item is 163 for Flixster versus 3 for Flickr. Number of actions per user is relatively similar.
  • #40 Aaaand….We find a similar story. FriendsOverlap overestimates influence, by about 14% in Flickr to more than 500% in Flixster. It is useful to pause here and speculate on why Flixster shows such a big variation: it could be that due to the dense item collection, there are a lot of instances when friends rate a common item but they are not necessarily due to influence. Similarly, for Flickr, given that there are fewer favorites per photo, if two friends favorite the same photo, it is much likely to be due to influence. there could be other reasons too: like prominence of the feed, how item consumption differ, whether items exist externally and so on..
  • #41 But whats interesting is that even with all these differences, the actual actions that can be attributed to influence fall in a narrow range: below 1% of the total actions by users. Really, these results shows that influence is overrated, atleast in sharing networks like these and question the common narrative around the big effect of feeds and social influence in sharing networks. . First of all, our influence procedure shows that naïve measures like FriendsOverlap greatly overestimate the influence. Secondly, when we account for homophily, the actual estimates are really a tiny fraction of all the activity on these sharing networks.
  • #42  In summary, both the experimental and observational evidence suggests that personal preference is still the dominant driver of people’s activities. Influence only plays a secondary role.
  • #43 More generally, these studies show the value of modeling both preference and influence. I showed how to accurately estimate the effect of social influence using preferences—this will be useful for evaluating the effect of feeds on sharing networks, also for marketers to evaluate their strategies. Studies on adoption and sharing also help recommendation systems---how to incorporate social information and show recommendations.
  • #44  In the future, I plan to continue on this path of unifying the study of recommendation and diffusion in sharing networks. One stream of work is to create new diffusion models that explicitly take in account people’s preferences. On the recommender systems side, I will continue asking questions about the properties such as utility, influence, especially long-term influence of such systems. E.g.Reason about properties of social recommendation: Social recommendation leads to “filter bubbles” Social recommendation invades our privacy. When is social recommendation (not) useful?
  • #45 I thank my collaborators who helped through this journey! ----OMIT--- to lead to the next generation of feeds and recommendation systems in sharing networks. Reason about properties of social recommendation: Social recommendation leads to “filter bubbles” Social recommendation invades our privacy. When is social recommendation (not) useful?
  • #46 A quick side note: in past work on sharing in broadcast social media, people tend to share mostly about themselves, indicating that individuation is dominant. In directed sharing, there is a known recipient and thus, we would expect that people would think about the recipient and customize their shares, making more altruistic suggestions.
  • #47 So what does it mean in terms of design implications? FeedMe suggests whom to share items with, thus making them more salient. ..More generally, our findings show the importance of accounting for sender and recipient preferences in sharing, which can be used to augment diffusion models.—most diffusion models consider the spread of items one at a time, ignoring people’s preferences.
  • #50 Remove this slide
  • #51 So then here’s a simple model of how participants might have processed the recommendations and submitted a rating. Given an item, a user decides whether or not she likes the item or not, depending how discerning, or how choosy, a user is. Also, the user looks at the explanation, which adds to her rating based on how much how receptive she is to the explanation shown. The combination of these two processes leads to the final rating.
  • #52 Let's use this intuition build up a generative model that will tell us more about the relative effect of people’s personal preferences towards the artists and the influence of social explanations. We can model the base decision process as a decreasing distribution—since given a random item that you are not familiar with, you are more likely to give it a low rating. We model the effect of explanations using a symmetric distribution(gaussian), centered around the mean representing the receptivity of an individual to explanation. Finally, the relative contribution of each effect is controlled by the rigidness of an individual…to give our mixture model for the generation of ratings. In a sense, the rigidness parameter tells us how much importance a user gives to her own preference about items.
  • #53 And we find that our model can explain much of the variance in these plots. We show the distribution of ratings for each explanation strategy, with the x-axis having the ratings from 0 to 10. Note that GoodFriend-based strategies and Overall Popularity show low rigidness, which is the relative contribution of personal preference, suggesting that they were the most convincing as we saw with the mean ratings. Again, secondary effect More generally, we could interpret this as adoption step of a diffusion model, where a mixture of people’s own preference and influence decide whether they want to try out an item. --- Two things worth noting. All strategies give a discernment of 0.4, which should be expected since discernment does not depend on explanation. GoodFriend-based strategies show lowest rigidness, which is what we previously saw with the mean ratings and user's comments. Models for all explanation strategies show same discernment ~0.47
  • #54 Such a model may also be used for recommendation. When we cluster users based on their mean ratings, we find that 3 clusters that look similar to the components of the mixture model. It seems that Users of cluster 1 were not influenced by the explanation at all, while those from cluster 3 are most receptive. While this is for all kinds of explanation, we also saw different effects for each explanation strategy. This suggests an opportunity for personalized explanation strategies—not showing the same kind of explanation to everyone but personalizing it based on each individual. [how many ratings were there]
  • #55 Accounting for difference in friend relationships, modeling each friend’s influence separately Long-term changes in preference? Studying social influence and recommendations for taste-based domains such as gaming and non taste-based domains Richer integration between micro and macro analyses
  • #56 Core user: Any user for whom at least 75% of friends have preferences in our dataset.
  • #57 I first collected from the popular social networks. The first dataset is for Liking musical artists on Facebook, the second for liking movies on Facebook, and the third about using hashtags on Twitter. These data were collected in a ego-centric fashion: for each user who opted in, we got their preference data and their friends. Core users are the users who opted-in. On average, users had about 20 likes and each item had close to 10 likes. Having datasets from different sharing networks and domains is useful to general picture and not get biased by properties of a certain network.