CIKM 2011 Social Computing Industry Invited Talk

4,380 views

Published on

Published in: Technology, Education

CIKM 2011 Social Computing Industry Invited Talk

  1. 1. CIKM 2011 | Invited Talk Model-Driven Research in Social Computing Ed H. Chi GoogleResearch Work done whileat Palo AltoResearch Center(PARC) 2011-10-27 CIKM 2011 Invited Talk 1
  2. 2. Some  Google  Social  Stats  n  250,000  words  are  written  each  minute  on  Blogger  -­‐   that’s  360  million  words  a  day  n  Every  16  seconds  people  view  enough  photos  from   Picasa  Web  Albums  to  cover  an  entire  football  field  n  Every  8  minutes,  more  photos  are  viewed  on  Picasa   Web  Albums  than  exist  in  the  entire  Time-­‐LIFE  photo   collection   2011-10-27 CIKM 2011 Invited Talk 2
  3. 3. YouTube  Stats  n  150  years  of  YouTube  video  are  watched  everyday  on   Facebook  (up  2.5x  y/y)  n  every  minute  400+  tweets  contain  YouTube  links  (up  3x   y/y)  [Q1  20111]  n  100M+  people  take  a  social  action  with  YouTube  (likes,   shares,  comments,  etc)  every  week  (10/15/10)   2011-10-27 CIKM 2011 Invited Talk 3
  4. 4. Google+  Stats  n  40  million  people  joined  Google  since  launch.  n  People  are  2x-­‐3x  times  more  likely  to  share  content  with   one  of  their  circles  than  to  make  a  public  post.   2011-10-27 CIKM 2011 Invited Talk 4
  5. 5. Social  Stream  Research  n  Analytics   –  Factors  impacting  retweetability  [Suh  et  al,  IEEE  Social   Computing  2010]   –  Location  field  of  user  profiles  [Hecht  et  al,  CHI  2011]   –  Organic  Q&A  behaviors  [Paul  et  al,  ICWSM’11]   –  Languages  used  in  Twitter  [Hong  et  al,  ICWSM’11]  n  Improving  Stream  Experience   –  Topic-­‐based  summarization  &  browsing  of  tweets  [Bernstein  et   al,  UIST2010]   –  Tweet  recommendation  [Chen  et  al,  CHI2010  &  CHI2011]   2011-10-27 CIKM 2011 Invited Talk 5
  6. 6. Invisible  Brokerage  Signals  across  Language  Barriers   Joint  work  w/  Lichan  Hong,  Gregorio  Convertino     [Hong  et  al.,  ICWSM  July  2011]     2011-10-27 CIKM 2011 Invited Talk 6
  7. 7. Motivation  for  Studying  Languages  n  Twitter  is  an  international  phenomenon   –  Most  research  focused  on  English  users   –  Question  about  generalization  to  non-­‐English   –  Understand  cross-­‐language  usage  differences   –  Design  implications  for  international  users  n  Research  Questions:   –  What  is  the  language  distribution  in  Twitter?   –  How  do  users  of  different  languages  use  Twitter?   –  How  do  bilingual  users  spread  information  across  languages?     2011-10-27 CIKM 2011 Invited Talk 7
  8. 8. Data  Collection  &  Processing    Twitter  stream   04/18/10-­‐05/16/10  (4  weeks)      62M  tweets   Google  Language  API  &  LingPipe    104  languages     Top  10  languages   2011-10-27 CIKM 2011 Invited Talk 8
  9. 9. Top  10  Languages  in  Twitter      Language            Tweets          %            Users   English   31,952,964   51.1   5,282,657   Japanese   11,975,429   19.1   1,335,074   Portuguese   5,993,584   9.6   993,083   Indonesian   3,483,842   5.6   338,116   Spanish   2,931,025   4.7   706,522   Dutch   883,942   1.4   247,529   Korean   754,189   1.2   116,506   French   603,706   1.0   261,481   German     588,409   1.0   192,477   Malay   559,381   0.9   180,147   2011-10-27 CIKM 2011 Invited Talk 9
  10. 10. Human-­‐Coding  Study  n  2,000  random  tweets  from  62M  tweets  n  2  human  judges  for  each  of  top  1o  languages     –  native  speakers  or  proficient   –  discuss  to  resolve  disagreement  n  Hard  to  find  Indonesian  &  Malay  judges  n  Presented  2,000  tweets  to  each  judge  n  Judge  selected  tweets  in  his/her  language   2011-10-27 CIKM 2011 Invited Talk 10
  11. 11. Machine  vs.  Human   T-­‐P:  true  positive,  T-­‐N:  true  negative,  F-­‐N:  false-­‐negative,  F-­‐P:  false  positive      Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa   English   974   971   20   35   0.95   Japanese   370   1,595   0   35   0.94   Portuguese   170   1,803   19   8   0.92   Indonesian   106   1,875   15   4   0.91   Spanish   96   1,889   11   4   0.92   Dutch   18   1,978   2   2   0.90   Korean   24   1,976   0   0   1.00   French   13   1,980   0   7   0.79   German     12   1,979   2   7   0.72   Malay   8   1,979   4   9   0.55   2011-10-27 CIKM 2011 Invited Talk 11
  12. 12. Accuracy  of  Language  Detection   n  Two  Types  of  Errors   –  Got  ur  dirct  msg.i’m  lukng  4wrd  2  twt  wit  u   too.so,wat  doing  ha…(detected  as  Afrikaans)   –  High  error  rate  for  tweets  of  1~2  words   2011-10-27 CIKM 2011 Invited Talk 12
  13. 13. Machine  vs.  Human      Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa   French   13   1,980   0   7   0.79   German     12   1,979   2   7   0.72   Malay   8   1,979   4   9   0.55   •  French:  5/7  F-­‐P  have  2  words   •  German:  1/2  F-­‐N  has  1  word;  6/7  F-­‐Ps  are  in  English   •  Malay:  3/4  F-­‐Ns  &  7/9  F-­‐Ps  are  in  Indonesian   2011-10-27 CIKM 2011 Invited Talk 13
  14. 14. Common  Twitter  Conventions   hashtag   mention   URL   reply  (per-­‐tweet  metadata)   retweet  2011-10-27 CIKM 2011 Invited Talk 14
  15. 15. Use  of  URLs  in  62M  Tweets      Language    URLs   n  Chi  Square  tests  confirmed  that   All   21%   differences  by  language  are   English   25%   significant.   Japanese   13%   Portuguese   13%   Indonesian   13%   Spanish   15%   Dutch   17%   Korean   17%   French   37%   German     39%   Malay   17%   2011-10-27 CIKM 2011 Invited Talk 15
  16. 16. Significant  Cross-­‐Language  Differences      Language    URLs   Hashtags   Mentions   Replies    Retweets   All   21%   11%   49%   31%   13%   English   25%   14%   47%   29%   13%   Japanese   13%   5%   43%   33%   7%   Portuguese   13%   12%   50%   32%   12%   Indonesian   13%   5%   72%   20%   39%   Spanish   15%   11%   58%   39%   14%   Dutch   17%   13%   50%   35%   11%   Korean   17%   11%   73%   59%   11%   French   37%   12%   48%   36%   9%   German     39%   18%   36%   25%   8%   Malay   17%   5%   62%   23%   29%   Chi  Square  tests  confirmed  that  differences  by  language  are  significant   2011-10-27 CIKM 2011 Invited Talk 16
  17. 17. Implications      Language    URLs    Hashtags    Mentions    Replies    Retweets   All   21%   11%   49%   31%   13%   Korean   17%   11%   73%   59%   11%   German     39%   18%   36%   25%   8%   n  Use  of  Twitter  for  social  networking  vs.  information   sharing  different  in  different  languages   n  Design  of  recommendation  engines   –  Korean  users:  promote  conversational  tweets   –  German  users:  promote  tweets  with  URLs   2011-10-27 CIKM 2011 Invited Talk 17
  18. 18. Studying  Bilingual  Brokers  n  Importance  of  brokers   –  Structural  holes  (Burt’92),  LiveJournal  (Herring  et  al’07)  n  Define  bilingual  brokers  as  Users  who  tweeted  in  a   pair  of  languages  n  Caveat   –  Under-­‐estimated  due  to  4-­‐week  time  limit   –  Over-­‐estimated  due  to  language  detection  errors   2011-10-27 CIKM 2011 Invited Talk 18
  19. 19. Number  of  Bilingual  Brokers   E   J   P   I   S   D   K   F   G  J   140,730  P   488,545   13,228   I   230,023   4,825   29,405  S   359,117   10,139   112,524   36,068  D   150,041   6,383   30,855   34,906   30,916  K   19,722   6,384   906   2,014   1,109   972  F   194,931   10,463   53,607   34,586   49,445   33,568   1,244    G 110,748   6,053   22,106   21,471   21,989   22,162   786   24,763    M 148,365   4,208   31,184   135,427   31,967   29,331   1,518   30,257   18,301   2011-10-27 CIKM 2011 Invited Talk 19
  20. 20. Sharing  URLs  Across  Languages   E   J   P   I   S   D   K   F   G   M  E 3,013   18,399   985   4,986   1,144   212   1,791   1,647   540  J   3,013   77   37   58   29   43   59   46   18  P 18,399   77   74   1,644   198   2   453   168   123   I   985   37   74   67   64   1   53   38   279  S 4,986   58   1,644   67   139   0   286   139   53  D 1,144   29   198   64   139   2   112   126   48  K 212   43   2   1   0   2   3   3   1  F   1,791   59   453   53   286   112   3   157   53  G 1,647   46   168   38   139   126   3   157   40  M 540   18   123   279   53   48   1   53   40   2011-10-27 CIKM 2011 Invited Talk 20
  21. 21. Sharing  Hashtags  Across  Languages   E   J   P   I   S   D   K   F   G   M    E 8,178   33,197   14,96 27,284   6,685   798   9,410   7,208   5,517   9  J   8,178   331   135   351   218   149   352   260   100    P 33,197   331   535   4,682   604   13   1,231   580   400   I   14,969   135   535   762   684   25   713   415   6,046    S 27,284   351   4,682   762   819   28   1,468   708   463    D 6,685   218   604   684   819   26   851   769   424    K 798   149   13   25   28   26   25   18   20  F   9,410   352   1,231   713   1,468   851   25   879   411    G 7,208   260   580   415   708   769   18   879   265    M 5,517   100   400   6,046   463   424   20   411   265   2011-10-27 CIKM 2011 Invited Talk 21
  22. 22. Implications  n  Indicators  of  connection  strength  between   languages   –  Number  of  bilingual  brokers   –  Acts  of  brokerage:  sharing  URLs  &  hashtags  n  English  well  connected  to  others,  and  may   function  as  a  hub  n  Need  to  improve  cross-­‐language   communications   2011-10-27 CIKM 2011 Invited Talk ? 22
  23. 23. Visible  Social  Signals  from    Shared  Items   Kudos  to  Jilin  Chen,  Rowan  Nairn       [Chen  et  al,  CHI2010]     [Chen  et  al.,  CHI2011]   2011-10-27 CIKM 2011 Invited Talk 23
  24. 24. Eddi:  Summarizing  Social  Streams   2011-10-27 CIKM 2011 Invited Talk 24
  25. 25. Information  Gathering/Seeking  n  The  Filtering  Problem:   –  “I  get  1,000+  items  in  my  stream  daily  but  only  have  time  to   read  10  of  them.  Which  ones  should  I  read?”  n  The  Discovery  Problem:   –  “There  are  millions  of  URLs  posted  daily  on  Twitter.  Am  I   missing  something  important  there  outside  my  own  Twitter   stream?”   2011-10-27 CIKM 2011 Invited Talk 25
  26. 26. Stream  Recommender  n  Zerozero88.com   –  Twitter  as  the  platform   –  URLs  as  the  medium   –  Produces  your   personal  headlines   2011-10-27 CIKM 2011 Invited Talk 26
  27. 27. URL Sources Topic Relevance User Topic Profiles ScoresSocial Network Scores Local Social NetworkRecommendation EngineØ Multiply scoresØ Rank URLs using multiplied scoresØ Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 27
  28. 28. URL  Sources  n  Considering  all  URLs  was  impossible  n  FoF:  URLs  from  followee-­‐of-­‐followees   –  Social  Local  News  is  Better  n  Popular:  URLs  that  are  popular  across  whole  Twitter   –  Popular  News  is  Better   Component Possible Design Choices URL Sources FoF (followee-of-followees) Popular 2011-10-27 CIKM 2011 Invited Talk 28
  29. 29. URL Sources Topic Relevance User Topic Profiles ScoresSocial Network Scores Local Social NetworkRecommendation EngineØ Multiply scoresØ Rank URLs using multiplied scoresØ Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 29
  30. 30. Topic  Relevance  Scores  Funny YouTube Video Funny Game …1.3 5.5 0.5 4.0 2.1 … 2011-10-27 CIKM 2011 Invited Talk 30
  31. 31. Topic  Profile  of  URLs  n  Built  from  tweets  that  contain  the  URL  n  However,  tweets  are  short     –  term  vectors  for  URLs  are  often  too  sparse  n  Adopt  a  term  expansion  technique  using  a  search  engine   Best  of  Show  CES  2011:  The  Motorola  Atrix      http://tcrn.ch/e0g3Oh   Add to Profile smartphone, mobility, … 2011-10-27 CIKM 2011 Invited Talk 31
  32. 32. Topic  Profile  of  Users  n  Self-­‐Topic:  content  profile  based  on  my  posts   –  My  Interest  as  Information  Producer  n  Followee-­‐Topic:  content  profile  based  on  my   followees’  posts   –  My  Interest  as  Information  Gatherer  n  None,  for  comparison  purpose   Component Possible Design Choices Topic Self-Topic Relevance Followee-Topic Scores None 2011-10-27 CIKM 2011 Invited Talk 32
  33. 33. My  Followees   Profile Profile Profile Profile Collect & Profile Profile Profile Profile Profile Profile ProfileA term is weighted higher in your profile if Find Topmore of your followees have the term as Key Termstheir top key terms Terms Terms Terms TermsProfile Aggregate Terms Terms Terms Terms Terms Terms2011-10-27 CIKM 2011 Invited Talk 33
  34. 34. URL Sources Topic Relevance User Topic Profiles ScoresSocial Network Scores Local Social NetworkRecommendation EngineØ Multiply scoresØ Rank URLs using multiplied scoresØ Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 34
  35. 35. Social  Network  Scores  n  “Popular  Vote”  in  among  my  followees-­‐of-­‐followees   –  People  “vote”  a  URL  by  tweeting  it   –  URLs  with  more  votes  in  total  are  assigned  higher  score   –  Votes  are  weighted  using  social  network  structure  n  None,  for  comparison  purpose   Component Possible Design Choices Social Social Voting Network None Scores 2011-10-27 CIKM 2011 Invited Talk 35
  36. 36. The  Intuition:  Local  Influence   follow 15 People follows Whose URLs should be weighted higher? Me   follows 5 People follow 2011-10-27 CIKM 2011 Invited Talk 36
  37. 37. Possible  Recommender  Designs  Component Possible Design ChoicesURL Sources FoF (followee-of-followees) PopularTopic Self-TopicRelevance Followee-Topic Recommendation EngineScores NoneSocial Social Voting Ø Multiply scoresNetwork None Ø Rank URLs using multiplied scoresScores Ø Recommend highest ranked URLs •  2 (URL source) x 3 (topic score) x 2 (social score) = 12 possible algorithm designs in total" •  Random selection if for both scores we chose None" 2011-10-27 CIKM 2011 Invited Talk 37
  38. 38. Study  Design   n  Within-­‐subject  design   n  Each  subject  evaluated  5  URL  recommendations   from  each  of  the  12  algorithms   –  Show  60  URLs  in  random  order,  and  ask  for  binary  rating   –  60  ratings  x  44  subjects  =  2640  ratings  in  total  
  39. 39. Summary  of  Results   Popular URLs FoF URLs Social Vote Only Best Performing 2011-10-27 CIKM 2011 Invited Talk 39 39
  40. 40. Algorithms  Differ  Not  Only  in  Accuracy!  n  Relevance  vs.  Serendipity  in  recommendations  n  From  a  subject  in  the  pilot  interview  of  zerozero88:   –  “There  is  a  tension  between  the  discovery  and  the  affirming   aspect  of  things.  I  am  getting  tweets  about  things  that  I  am   already  interested  in.  Something  I  crave  …,  is  an  element  of   surprise  or  whimsy.  ...  I  am  getting  a  lot  of  things  I  am   interested  in,  but  that  is  not  necessarily  a  good  thing  for  me   personally”   2011-10-27 CIKM 2011 Invited Talk 40
  41. 41. Design  Rule  n  Interaction  costs   determine  number  of   people  who  participate   # People willing to participate –  Surplus  of  attention  &   motivation  at  small   transaction  costs  n  Therefore:    n  Important  to  keep   interaction  costs  low   –  Recommendation   –  Summarization   Cost of participationn  Or  bring  new  benefits   2008-05-13 CSCL 2011 Keynote
  42. 42. Thank  you!   n  chi@acm.org   n  http://edchi.net   2011-10-27 CIKM 2011 Invited Talk 42

×