SlideShare a Scribd company logo
mining	
  the	
  social	
  web	
  
Aris2des	
  Gionis	
  
Michael	
  Mathioudakis	
  
firstname.lastname@aalto.fi	
  
	
  
	
  
Aalto	
  University	
  
Spring	
  2015	
  
social	
  web	
  
	
  
	
  
facebook	
  twiEer	
  linkedin	
  
foursquare	
  flickr	
  instagram	
  
pinterest	
  youtube	
  ustream	
  
github	
  stackoverflow	
  wikipedia	
  
	
  
2	
  
social	
  web	
  
	
  
websites	
  and	
  plaHorms	
  that	
  enable	
  users	
  to	
  
produce	
  content	
  
blog	
  posts,	
  ‘status’	
  messages,	
  videos,	
  pictures,	
  podcasts	
  
consume	
  content	
  
read	
  text	
  -­‐	
  blog	
  posts,	
  ‘status’	
  messages	
  
listen	
  to	
  podcasts,	
  watch	
  videos	
  
interact	
  with	
  each	
  other	
  
comment	
  on	
  each	
  other’s	
  posts,	
  ‘like’	
  or	
  rate	
  items	
  
3	
  
mining	
  the	
  social	
  web	
  
a	
  lot	
  of	
  users...	
  a	
  lot	
  of	
  data...	
  
what	
  could	
  we	
  learn*?	
  
*	
  assuming	
  we	
  have	
  the	
  data	
  -­‐	
  more	
  on	
  that	
  later	
  
	
  
gain	
  insights	
  into...	
  
social	
  behavior	
  
how	
  many	
  connec2ons	
  does	
  an	
  average	
  person	
  have?	
  
do	
  people	
  connect	
  with	
  like-­‐minded	
  people?	
  
poli2cal	
  sen2ment	
  
what	
  do	
  people	
  think	
  about	
  current	
  poli2cal	
  issues?	
  
how	
  we	
  experience	
  our	
  ci2es	
  
what’s	
  the	
  best	
  neighborhood	
  for	
  food/nightlife?	
  
how	
  we	
  build	
  our	
  careers	
  
how	
  oRen	
  do	
  people	
  change	
  careers?	
  
how	
  beneficial	
  is	
  it	
  to	
  ‘network’	
  professionally?	
  
other?	
  
4	
  
mining	
  the	
  social	
  web	
  
	
  
there	
  is	
  already	
  research	
  that	
  
explores	
  those	
  ques2ons	
  
	
  
we	
  will	
  discuss	
  some	
  of	
  it	
  
now	
  and	
  in	
  the	
  next	
  two	
  lectures	
  
5	
  
twiEer	
  
•  a	
  social	
  sensor	
  
– social	
  network	
  +	
  news	
  media	
  
– what	
  is	
  happening?	
  
– where,	
  who?	
  happening?	
  
– trends	
  
– events	
  
– opinions	
  
– poli2cal	
  views	
  
– sen2ments	
  
– demographics	
  
6	
  
twiEer	
  studies	
  
•  finding	
  news	
  events	
  and	
  stories	
  
•  detec2ng	
  trends	
  
•  predic2ng	
  consumer	
  behavior	
  
•  predic2ng	
  stock	
  market(!)	
  
•  disaster	
  response	
  
•  rumor	
  analysis	
  and	
  credibility	
  assessment	
  
•  influence	
  analysis	
  
•  poli2cal	
  analysis	
  
–  polariza2on,	
  bias	
  of	
  news	
  media	
  
•  sociology	
  studies	
  
–  sen2ment	
  vs.	
  demographics,	
  gender	
  inequality	
  
	
   7	
  
•  photo	
  sharing	
  +	
  social	
  network	
  
•  photos	
  contain	
  addi2onal	
  informa2on	
  
– tags	
  
– geoloca2on	
  
– comments,	
  favorites	
  
– assigned	
  to	
  groups	
  
8	
  
9	
  
Eric	
  Fischer	
   10	
  
recommend	
  tourist	
  i2neraries	
  
11	
  
foursquare	
  
•  loca2on-­‐based	
  social	
  network	
  
•  users	
  check-­‐in	
  to	
  different	
  loca2ons	
  
•  loca2ons	
  have	
  types	
  (hierarchy)	
  
– restaurant,	
  sport	
  venue,	
  museum,	
  college,	
  …	
  	
  	
  
•  ques2ons:	
  
– where	
  do	
  people	
  hang	
  out?	
  
– where	
  events	
  take	
  place?	
  
– do	
  friends	
  influence	
  each	
  other?	
  
12	
  
when/where	
  people	
  check	
  in?	
  . exploration 
0 5 10 15 20
New-York
London
Barcelona
Helsinki
Total
(a) Hourly check-ins frequency during the day. The activity is at its lowest
around  a.m. and after that, there are three peaks: one when people
go to work in the morning, one in the middle of the day and the last
one at the end of the evening. Yet, depending of the city, these peaks
do not happen at the same time, nor with the same intensity. Therefore,
instead of working directly the raw values of features, we use the number
of standard deviation or z-score.
– – – – – – – –
10
20
hour
perce
– – – – – –
10
20
30
40
50
60
hour
percentage
 hours time clusters in Paris
Figure : Venues clustered by time of check-ins.
13	
  
when/where	
  people	
  check	
  in?	
   datasets
City Name Category Entropy
Barcelona
Castellers de Barcelona Non-Profit 0.0139
Café de la Pompeu Café 0.0172
Ràdio  Radio Station 0.0176
Paris
Boutique Orange Electronics Store 0.0099
Métro Goncourt [] Subway 0.0105
Blue Acacia Office 0.0112
Barcelona
Plaça de Catalunya Plaza 0.5835
Sants Estació Train Station 0.6298
Sagrada Família Government Building 0.6309
Camp Nou Stadium 0.6852
Paris
Gare SNCF : Gare de Lyon Train Station 0.6725
Gare SNCF : Paris Nord Train Station 0.6911
Musée du Louvre Museum 0.6924
Tour Eiffel Government Building 0.7167
(a) Venues in Paris and Barcelona with lowest and highest user en-
tropy.
14	
  
data	
  sources	
  less	
  obvious	
  
traffic	
  sensors	
  
15	
  
detec2ng	
  events	
  with	
  traffic	
  sensors	
  
16	
  
project	
  ideas	
  less	
  obvious	
  
17	
  
your	
  project	
  
come	
  up	
  with	
  a	
  project	
  idea	
  
implement	
  it!	
  
report	
  on	
  your	
  results	
  and	
  findings	
  
18	
  
types	
  of	
  projects	
  
•  form	
  a	
  hypothesis	
  and	
  set	
  out	
  to	
  test	
  it	
  
–  are	
  rich	
  people	
  happier?	
  
•  start	
  with	
  an	
  interes2ng	
  ques2on	
  
–  which	
  are	
  hipster	
  neighborhoods	
  in	
  my	
  city?	
  
•  start	
  with	
  a	
  business	
  idea	
  
–  recommend	
  relevant	
  music	
  to	
  music	
  listeners	
  
–  recommend	
  clothes	
  to	
  music	
  listeners	
  
•  start	
  with	
  a	
  problem	
  that	
  you	
  (think)	
  can	
  solve	
  	
  
–  how	
  to	
  iden2fy	
  trends	
  in	
  space	
  and	
  2me?	
  
•  start	
  with	
  a	
  cool	
  dataset	
  and	
  explore	
  it	
  
19	
  
your	
  project	
  
analyze	
  data	
  
set	
  a	
  goal	
  for	
  your	
  project	
  
(what’s	
  the	
  ques2on	
  you	
  want	
  to	
  answer)	
  
study	
  related	
  literature	
  
(what	
  has	
  /	
  hasn’t	
  been	
  done	
  already?	
  
or	
  you	
  think	
  you	
  can	
  do	
  it	
  beEer)	
  
collect	
  data	
  
(some	
  data	
  are	
  more	
  difficult	
  to	
  come	
  by)	
  
results	
  
evalua2on	
  
(have	
  you	
  answered	
  the	
  ques2on	
  
asked	
  originally?	
  possible	
  improvements?	
  
future	
  work?)	
  
1	
   2	
  
3	
  
4	
  
5	
  
6	
  
20	
  
coming	
  up	
  with	
  a	
  project	
  idea	
  
•  conferences:	
  	
  
SIGKDD,	
  ICWSM,	
  WWW,	
  WSDM	
  
•  themes	
  
–  urban	
  compu2ng,	
  trend	
  /	
  event	
  detec2on,	
  social	
  
networks,	
  poli2cal	
  sen2ment,	
  privacy	
  
–  other	
  
•  google	
  scholar	
  
•  talk	
  with	
  us	
  
office	
  hours:	
  Mon,	
  14:15-­‐15:30	
  	
  
and	
  by	
  appointment	
  
21	
  
collec2ng	
  the	
  data	
  
•  what	
  data	
  are	
  available?	
  
–  different	
  plaHorms	
  share	
  different	
  data	
  about	
  their	
  users’	
  ac2vity	
  
–  browse	
  dev	
  sites	
  of	
  social	
  networks	
  	
  find	
  out	
  about	
  privacy	
  policies	
  
and	
  APIs	
  
–  browse	
  public	
  data	
  repositories	
  
–  the	
  data	
  mining	
  group	
  has	
  data	
  for	
  
blog	
  posts,	
  twiEer,	
  google+,	
  facebook,	
  foursquare	
  
	
  
•  code	
  
Mining	
  the	
  Social	
  Web	
  (github)	
  
hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐
Web-­‐2nd-­‐Edi2on	
  
22	
  
schedule	
  
•  Today:	
  overview	
  
•  February	
  2nd	
  :	
  discuss	
  literature	
  (Aris)	
  
•  February	
  9th	
  :	
  discuss	
  literature	
  (Michael)	
  
•  February	
  16th	
  	
  23rd:	
  present	
  project	
  proposals	
  
•  March	
  30th	
  :	
  students	
  submit	
  progress	
  report	
  
•  March	
  30th	
  	
  April	
  6th:	
  intermediate	
  presenta2ons	
  
•  May	
  4th	
  	
  May	
  11th	
  :	
  final	
  presenta2ons	
  
•  May	
  15th	
  :	
  final	
  report	
  due	
  
23	
  
final	
  report	
  
•  introduc2on	
  
•  related	
  work	
  
•  problem	
  statement	
  
•  proposed	
  technique	
  (algorithms)	
  
•  data	
  descrip2on	
  
•  empirical	
  evalua2on	
  	
  
–  results	
  
–  comparison	
  with	
  state	
  of	
  the	
  art	
  
•  future	
  work	
  
24	
  
grading	
  
•  originality	
  (has	
  it	
  been	
  done	
  before)	
  
•  poten2al	
  impact	
  (how	
  interes2ng	
  it	
  is	
  	
  why)	
  
•  rigorousness	
  of	
  proposed	
  technique	
  
•  reproducibility	
  (public	
  code)	
  
•  presenta2on	
  
•  teams	
  of	
  2	
  are	
  encouraged	
  
•  presenta2ons	
  	
  reports	
  are	
  required	
  
•  surveys	
  of	
  exis2ng	
  techniques	
  are	
  ok,	
  too	
  
25	
  
schedule	
  
•  Today:	
  overview	
  
•  February	
  2nd	
  :	
  discuss	
  literature	
  (Aris)	
  
•  February	
  9th	
  :	
  discuss	
  literature	
  (Michael)	
  
•  February	
  16th	
  and	
  23rd:	
  students	
  present	
  project	
  
proposals	
  
•  March	
  30th	
  :	
  students	
  submit	
  progress	
  report	
  
•  March	
  30th	
  	
  April	
  6th:	
  intermediate	
  presenta2ons	
  
•  May	
  4th	
  	
  May	
  11th	
  :	
  final	
  presenta2ons	
  
•  May	
  15th	
  :	
  final	
  report	
  due	
  
26	
  
un2l	
  then...	
  
browse	
  literature	
  
see	
  papers	
  posted	
  on	
  noppa	
  for	
  a	
  sample	
  
conferences	
  KDD,	
  ICWSM,	
  WWW,	
  WSDM	
  	
  
google	
  scholar	
  
dev	
  websites,	
  
for	
  example...	
  
hEps://dev.twiEer.com,	
  hEps://developers.facebook.com,	
  
hEps://developer.github.com/,	
  hEps://developer.foursquare.com	
  
code	
  samples,	
  
hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on	
  
data	
  repositories,	
  
hEp://snap.stanford.edu/,	
  hEp://icwsm.org/2013/datasets/datasets/,	
  
hEp://wadam-­‐data.dis.uniroma1.it	
  
and	
  talk	
  to	
  us!	
   27	
  
see	
  you	
  next	
  week!	
  
	
  
Aris2des	
  Gionis	
  
Michael	
  Mathioudakis	
  
contact:	
  firstname.lastname@aalto.fi	
  
	
  
	
  
Office	
  Hours:	
  Mon,	
  14:15-­‐15:30	
  	
  
and	
  by	
  appointment	
  
28	
  

More Related Content

What's hot

2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
Marc Smith
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
THomas Plotkowiak
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Xiaohan Zeng
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online Communities
The Open University
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
suresh sood
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lora Aroyo
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
Marc Smith
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Denis Parra Santander
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
Marco Brambilla
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
Marc Smith
 
From smart meters to smart behaviour
From smart meters to smart behaviourFrom smart meters to smart behaviour
From smart meters to smart behaviour
The Open University
 
Roles In Networks
Roles In NetworksRoles In Networks
Roles In Networks
Patti Anklam
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
Marc Smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
Local Social Summit
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
Marc Smith
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
Duke Network Analysis Center
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
Rory Sie
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
Fabien Gandon
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 

What's hot (20)

Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online Communities
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
From smart meters to smart behaviour
From smart meters to smart behaviourFrom smart meters to smart behaviour
From smart meters to smart behaviour
 
Roles In Networks
Roles In NetworksRoles In Networks
Roles In Networks
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 

Similar to Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

Fail ir16 intro
Fail ir16 introFail ir16 intro
Fail ir16 intro
Katrin Weller
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
Fabien Gandon
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
CUbRIK Project
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
Chantal van Son
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
Yiannis Kompatsiaris
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
Lauri Eloranta
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
Digital Methods Initiative
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1
Miriam Fernandez
 
eLeader Conference Milan 2014
eLeader Conference Milan 2014eLeader Conference Milan 2014
eLeader Conference Milan 2014
Paola De Vecchi Galbiati
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
Pier Luca Lanzi
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
MaxKemman
 
World CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit CytomicsWorld CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit Cytomics
OKCC/C3O and CREM/université Lorraine
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
Marieke van Erp
 
Malina aug 24 ash steam 2020
Malina aug 24  ash steam 2020Malina aug 24  ash steam 2020
Malina aug 24 ash steam 2020
roger malina
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
Axel Bruns
 
24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher
Jane Finnis
 
World ctc2013scoopitcytomics
World ctc2013scoopitcytomicsWorld ctc2013scoopitcytomics
World ctc2013scoopitcytomics
OKCC/C3O and CREM/université Lorraine
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
Tarek Hoteit
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
Fabien Gandon
 

Similar to Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides (20)

Fail ir16 intro
Fail ir16 introFail ir16 intro
Fail ir16 intro
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1
 
eLeader Conference Milan 2014
eLeader Conference Milan 2014eLeader Conference Milan 2014
eLeader Conference Milan 2014
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
 
World CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit CytomicsWorld CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit Cytomics
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Malina aug 24 ash steam 2020
Malina aug 24  ash steam 2020Malina aug 24  ash steam 2020
Malina aug 24 ash steam 2020
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher
 
World ctc2013scoopitcytomics
World ctc2013scoopitcytomicsWorld ctc2013scoopitcytomics
World ctc2013scoopitcytomics
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 

More from Michael Mathioudakis

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social media
Michael Mathioudakis
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
Michael Mathioudakis
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
Michael Mathioudakis
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
Michael Mathioudakis
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01
Michael Mathioudakis
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
Michael Mathioudakis
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020
Michael Mathioudakis
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
Michael Mathioudakis
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentation
Michael Mathioudakis
 

More from Michael Mathioudakis (9)

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social media
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentation
 

Recently uploaded

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 

Recently uploaded (20)

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 

Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

  • 1. mining  the  social  web   Aris2des  Gionis   Michael  Mathioudakis   firstname.lastname@aalto.fi       Aalto  University   Spring  2015  
  • 2. social  web       facebook  twiEer  linkedin   foursquare  flickr  instagram   pinterest  youtube  ustream   github  stackoverflow  wikipedia     2  
  • 3. social  web     websites  and  plaHorms  that  enable  users  to   produce  content   blog  posts,  ‘status’  messages,  videos,  pictures,  podcasts   consume  content   read  text  -­‐  blog  posts,  ‘status’  messages   listen  to  podcasts,  watch  videos   interact  with  each  other   comment  on  each  other’s  posts,  ‘like’  or  rate  items   3  
  • 4. mining  the  social  web   a  lot  of  users...  a  lot  of  data...   what  could  we  learn*?   *  assuming  we  have  the  data  -­‐  more  on  that  later     gain  insights  into...   social  behavior   how  many  connec2ons  does  an  average  person  have?   do  people  connect  with  like-­‐minded  people?   poli2cal  sen2ment   what  do  people  think  about  current  poli2cal  issues?   how  we  experience  our  ci2es   what’s  the  best  neighborhood  for  food/nightlife?   how  we  build  our  careers   how  oRen  do  people  change  careers?   how  beneficial  is  it  to  ‘network’  professionally?   other?   4  
  • 5. mining  the  social  web     there  is  already  research  that   explores  those  ques2ons     we  will  discuss  some  of  it   now  and  in  the  next  two  lectures   5  
  • 6. twiEer   •  a  social  sensor   – social  network  +  news  media   – what  is  happening?   – where,  who?  happening?   – trends   – events   – opinions   – poli2cal  views   – sen2ments   – demographics   6  
  • 7. twiEer  studies   •  finding  news  events  and  stories   •  detec2ng  trends   •  predic2ng  consumer  behavior   •  predic2ng  stock  market(!)   •  disaster  response   •  rumor  analysis  and  credibility  assessment   •  influence  analysis   •  poli2cal  analysis   –  polariza2on,  bias  of  news  media   •  sociology  studies   –  sen2ment  vs.  demographics,  gender  inequality     7  
  • 8. •  photo  sharing  +  social  network   •  photos  contain  addi2onal  informa2on   – tags   – geoloca2on   – comments,  favorites   – assigned  to  groups   8  
  • 12. foursquare   •  loca2on-­‐based  social  network   •  users  check-­‐in  to  different  loca2ons   •  loca2ons  have  types  (hierarchy)   – restaurant,  sport  venue,  museum,  college,  …       •  ques2ons:   – where  do  people  hang  out?   – where  events  take  place?   – do  friends  influence  each  other?   12  
  • 13. when/where  people  check  in?  . exploration 0 5 10 15 20 New-York London Barcelona Helsinki Total (a) Hourly check-ins frequency during the day. The activity is at its lowest around a.m. and after that, there are three peaks: one when people go to work in the morning, one in the middle of the day and the last one at the end of the evening. Yet, depending of the city, these peaks do not happen at the same time, nor with the same intensity. Therefore, instead of working directly the raw values of features, we use the number of standard deviation or z-score. – – – – – – – – 10 20 hour perce – – – – – – 10 20 30 40 50 60 hour percentage hours time clusters in Paris Figure : Venues clustered by time of check-ins. 13  
  • 14. when/where  people  check  in?   datasets City Name Category Entropy Barcelona Castellers de Barcelona Non-Profit 0.0139 Café de la Pompeu Café 0.0172 Ràdio Radio Station 0.0176 Paris Boutique Orange Electronics Store 0.0099 Métro Goncourt [] Subway 0.0105 Blue Acacia Office 0.0112 Barcelona Plaça de Catalunya Plaza 0.5835 Sants Estació Train Station 0.6298 Sagrada Família Government Building 0.6309 Camp Nou Stadium 0.6852 Paris Gare SNCF : Gare de Lyon Train Station 0.6725 Gare SNCF : Paris Nord Train Station 0.6911 Musée du Louvre Museum 0.6924 Tour Eiffel Government Building 0.7167 (a) Venues in Paris and Barcelona with lowest and highest user en- tropy. 14  
  • 15. data  sources  less  obvious   traffic  sensors   15  
  • 16. detec2ng  events  with  traffic  sensors   16  
  • 17. project  ideas  less  obvious   17  
  • 18. your  project   come  up  with  a  project  idea   implement  it!   report  on  your  results  and  findings   18  
  • 19. types  of  projects   •  form  a  hypothesis  and  set  out  to  test  it   –  are  rich  people  happier?   •  start  with  an  interes2ng  ques2on   –  which  are  hipster  neighborhoods  in  my  city?   •  start  with  a  business  idea   –  recommend  relevant  music  to  music  listeners   –  recommend  clothes  to  music  listeners   •  start  with  a  problem  that  you  (think)  can  solve     –  how  to  iden2fy  trends  in  space  and  2me?   •  start  with  a  cool  dataset  and  explore  it   19  
  • 20. your  project   analyze  data   set  a  goal  for  your  project   (what’s  the  ques2on  you  want  to  answer)   study  related  literature   (what  has  /  hasn’t  been  done  already?   or  you  think  you  can  do  it  beEer)   collect  data   (some  data  are  more  difficult  to  come  by)   results   evalua2on   (have  you  answered  the  ques2on   asked  originally?  possible  improvements?   future  work?)   1   2   3   4   5   6   20  
  • 21. coming  up  with  a  project  idea   •  conferences:     SIGKDD,  ICWSM,  WWW,  WSDM   •  themes   –  urban  compu2ng,  trend  /  event  detec2on,  social   networks,  poli2cal  sen2ment,  privacy   –  other   •  google  scholar   •  talk  with  us   office  hours:  Mon,  14:15-­‐15:30     and  by  appointment   21  
  • 22. collec2ng  the  data   •  what  data  are  available?   –  different  plaHorms  share  different  data  about  their  users’  ac2vity   –  browse  dev  sites  of  social  networks    find  out  about  privacy  policies   and  APIs   –  browse  public  data  repositories   –  the  data  mining  group  has  data  for   blog  posts,  twiEer,  google+,  facebook,  foursquare     •  code   Mining  the  Social  Web  (github)   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐ Web-­‐2nd-­‐Edi2on   22  
  • 23. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th    23rd:  present  project  proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   23  
  • 24. final  report   •  introduc2on   •  related  work   •  problem  statement   •  proposed  technique  (algorithms)   •  data  descrip2on   •  empirical  evalua2on     –  results   –  comparison  with  state  of  the  art   •  future  work   24  
  • 25. grading   •  originality  (has  it  been  done  before)   •  poten2al  impact  (how  interes2ng  it  is    why)   •  rigorousness  of  proposed  technique   •  reproducibility  (public  code)   •  presenta2on   •  teams  of  2  are  encouraged   •  presenta2ons    reports  are  required   •  surveys  of  exis2ng  techniques  are  ok,  too   25  
  • 26. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th  and  23rd:  students  present  project   proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   26  
  • 27. un2l  then...   browse  literature   see  papers  posted  on  noppa  for  a  sample   conferences  KDD,  ICWSM,  WWW,  WSDM     google  scholar   dev  websites,   for  example...   hEps://dev.twiEer.com,  hEps://developers.facebook.com,   hEps://developer.github.com/,  hEps://developer.foursquare.com   code  samples,   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on   data  repositories,   hEp://snap.stanford.edu/,  hEp://icwsm.org/2013/datasets/datasets/,   hEp://wadam-­‐data.dis.uniroma1.it   and  talk  to  us!   27  
  • 28. see  you  next  week!     Aris2des  Gionis   Michael  Mathioudakis   contact:  firstname.lastname@aalto.fi       Office  Hours:  Mon,  14:15-­‐15:30     and  by  appointment   28