Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

259 views

Published on

Slides for the class "Mining the Social Web".

Published in: Education
  • Be the first to comment

  • Be the first to like this

Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

  1. 1. mining  the  social  web   Aris2des  Gionis   Michael  Mathioudakis   firstname.lastname@aalto.fi       Aalto  University   Spring  2015  
  2. 2. social  web       facebook  twiEer  linkedin   foursquare  flickr  instagram   pinterest  youtube  ustream   github  stackoverflow  wikipedia     2  
  3. 3. social  web     websites  and  plaHorms  that  enable  users  to   produce  content   blog  posts,  ‘status’  messages,  videos,  pictures,  podcasts   consume  content   read  text  -­‐  blog  posts,  ‘status’  messages   listen  to  podcasts,  watch  videos   interact  with  each  other   comment  on  each  other’s  posts,  ‘like’  or  rate  items   3  
  4. 4. mining  the  social  web   a  lot  of  users...  a  lot  of  data...   what  could  we  learn*?   *  assuming  we  have  the  data  -­‐  more  on  that  later     gain  insights  into...   social  behavior   how  many  connec2ons  does  an  average  person  have?   do  people  connect  with  like-­‐minded  people?   poli2cal  sen2ment   what  do  people  think  about  current  poli2cal  issues?   how  we  experience  our  ci2es   what’s  the  best  neighborhood  for  food/nightlife?   how  we  build  our  careers   how  oRen  do  people  change  careers?   how  beneficial  is  it  to  ‘network’  professionally?   other?   4  
  5. 5. mining  the  social  web     there  is  already  research  that   explores  those  ques2ons     we  will  discuss  some  of  it   now  and  in  the  next  two  lectures   5  
  6. 6. twiEer   •  a  social  sensor   – social  network  +  news  media   – what  is  happening?   – where,  who?  happening?   – trends   – events   – opinions   – poli2cal  views   – sen2ments   – demographics   6  
  7. 7. twiEer  studies   •  finding  news  events  and  stories   •  detec2ng  trends   •  predic2ng  consumer  behavior   •  predic2ng  stock  market(!)   •  disaster  response   •  rumor  analysis  and  credibility  assessment   •  influence  analysis   •  poli2cal  analysis   –  polariza2on,  bias  of  news  media   •  sociology  studies   –  sen2ment  vs.  demographics,  gender  inequality     7  
  8. 8. •  photo  sharing  +  social  network   •  photos  contain  addi2onal  informa2on   – tags   – geoloca2on   – comments,  favorites   – assigned  to  groups   8  
  9. 9. 9  
  10. 10. Eric  Fischer   10  
  11. 11. recommend  tourist  i2neraries   11  
  12. 12. foursquare   •  loca2on-­‐based  social  network   •  users  check-­‐in  to  different  loca2ons   •  loca2ons  have  types  (hierarchy)   – restaurant,  sport  venue,  museum,  college,  …       •  ques2ons:   – where  do  people  hang  out?   – where  events  take  place?   – do  friends  influence  each  other?   12  
  13. 13. when/where  people  check  in?  . exploration 0 5 10 15 20 New-York London Barcelona Helsinki Total (a) Hourly check-ins frequency during the day. The activity is at its lowest around a.m. and after that, there are three peaks: one when people go to work in the morning, one in the middle of the day and the last one at the end of the evening. Yet, depending of the city, these peaks do not happen at the same time, nor with the same intensity. Therefore, instead of working directly the raw values of features, we use the number of standard deviation or z-score. – – – – – – – – 10 20 hour perce – – – – – – 10 20 30 40 50 60 hour percentage hours time clusters in Paris Figure : Venues clustered by time of check-ins. 13  
  14. 14. when/where  people  check  in?   datasets City Name Category Entropy Barcelona Castellers de Barcelona Non-Profit 0.0139 Café de la Pompeu Café 0.0172 Ràdio Radio Station 0.0176 Paris Boutique Orange Electronics Store 0.0099 Métro Goncourt [] Subway 0.0105 Blue Acacia Office 0.0112 Barcelona Plaça de Catalunya Plaza 0.5835 Sants Estació Train Station 0.6298 Sagrada Família Government Building 0.6309 Camp Nou Stadium 0.6852 Paris Gare SNCF : Gare de Lyon Train Station 0.6725 Gare SNCF : Paris Nord Train Station 0.6911 Musée du Louvre Museum 0.6924 Tour Eiffel Government Building 0.7167 (a) Venues in Paris and Barcelona with lowest and highest user en- tropy. 14  
  15. 15. data  sources  less  obvious   traffic  sensors   15  
  16. 16. detec2ng  events  with  traffic  sensors   16  
  17. 17. project  ideas  less  obvious   17  
  18. 18. your  project   come  up  with  a  project  idea   implement  it!   report  on  your  results  and  findings   18  
  19. 19. types  of  projects   •  form  a  hypothesis  and  set  out  to  test  it   –  are  rich  people  happier?   •  start  with  an  interes2ng  ques2on   –  which  are  hipster  neighborhoods  in  my  city?   •  start  with  a  business  idea   –  recommend  relevant  music  to  music  listeners   –  recommend  clothes  to  music  listeners   •  start  with  a  problem  that  you  (think)  can  solve     –  how  to  iden2fy  trends  in  space  and  2me?   •  start  with  a  cool  dataset  and  explore  it   19  
  20. 20. your  project   analyze  data   set  a  goal  for  your  project   (what’s  the  ques2on  you  want  to  answer)   study  related  literature   (what  has  /  hasn’t  been  done  already?   or  you  think  you  can  do  it  beEer)   collect  data   (some  data  are  more  difficult  to  come  by)   results   evalua2on   (have  you  answered  the  ques2on   asked  originally?  possible  improvements?   future  work?)   1   2   3   4   5   6   20  
  21. 21. coming  up  with  a  project  idea   •  conferences:     SIGKDD,  ICWSM,  WWW,  WSDM   •  themes   –  urban  compu2ng,  trend  /  event  detec2on,  social   networks,  poli2cal  sen2ment,  privacy   –  other   •  google  scholar   •  talk  with  us   office  hours:  Mon,  14:15-­‐15:30     and  by  appointment   21  
  22. 22. collec2ng  the  data   •  what  data  are  available?   –  different  plaHorms  share  different  data  about  their  users’  ac2vity   –  browse  dev  sites  of  social  networks    find  out  about  privacy  policies   and  APIs   –  browse  public  data  repositories   –  the  data  mining  group  has  data  for   blog  posts,  twiEer,  google+,  facebook,  foursquare     •  code   Mining  the  Social  Web  (github)   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐ Web-­‐2nd-­‐Edi2on   22  
  23. 23. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th    23rd:  present  project  proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   23  
  24. 24. final  report   •  introduc2on   •  related  work   •  problem  statement   •  proposed  technique  (algorithms)   •  data  descrip2on   •  empirical  evalua2on     –  results   –  comparison  with  state  of  the  art   •  future  work   24  
  25. 25. grading   •  originality  (has  it  been  done  before)   •  poten2al  impact  (how  interes2ng  it  is    why)   •  rigorousness  of  proposed  technique   •  reproducibility  (public  code)   •  presenta2on   •  teams  of  2  are  encouraged   •  presenta2ons    reports  are  required   •  surveys  of  exis2ng  techniques  are  ok,  too   25  
  26. 26. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th  and  23rd:  students  present  project   proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   26  
  27. 27. un2l  then...   browse  literature   see  papers  posted  on  noppa  for  a  sample   conferences  KDD,  ICWSM,  WWW,  WSDM     google  scholar   dev  websites,   for  example...   hEps://dev.twiEer.com,  hEps://developers.facebook.com,   hEps://developer.github.com/,  hEps://developer.foursquare.com   code  samples,   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on   data  repositories,   hEp://snap.stanford.edu/,  hEp://icwsm.org/2013/datasets/datasets/,   hEp://wadam-­‐data.dis.uniroma1.it   and  talk  to  us!   27  
  28. 28. see  you  next  week!     Aris2des  Gionis   Michael  Mathioudakis   contact:  firstname.lastname@aalto.fi       Office  Hours:  Mon,  14:15-­‐15:30     and  by  appointment   28  

×