SlideShare a Scribd company logo
1 of 14
Download to read offline
06/05/2013	
  
1	
  
Data	
  mining	
  for	
  
analyzing	
  the	
  
social	
  media	
   Social	
  
Networks	
  
Video/picture	
  
sharing	
  
Opinions	
  
News	
  websites	
  
Blogs	
  
Knowledge	
  
sharing	
  Microblogging	
  
eminar	
  at	
  	
  	
   	
   	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  4/18/2013	
  
PresentaCon:	
  J.	
  Velcin	
  
hGp://mediamining.univ-­‐lyon2.fr/people/velcin	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Ecosystem	
  of	
  ERIC	
  Lab	
  
2
Axe Carrés 2 ter
BSc	
  &	
  MSc	
  degrees	
  
BI,	
  data	
  mining,	
  staCsCcs	
  2	
  teams:	
  SID	
  &	
  DMD	
  
Academics	
  
Companies	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Lyon	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Research	
  landscape	
  
3
Data	
  
Data-­‐
warehouse	
  
Knowledge	
  
ETL	
  
Online	
  analysis	
  
Data	
  mining	
  
D
e
c
i
s
i
o
n	
  
Complex	
  data	
  
integraCon	
  
MulCdimensional	
  
modeling	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Data	
  Mining	
  &	
  
Decision	
  (DMD)	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Data	
  Mining	
  &	
  Decision	
  (DMD)	
  
4
Social	
  
Networks	
  
Microblogging	
  
	
  
Video/picture	
  
sharing	
  
Opinion	
  sharing	
  
News	
  websites	
  
Blogs	
  
Knowledge	
  
sharing	
  
e.g.	
  Social	
  Media	
  
-­‐ 	
  heterogeneous	
  
-­‐ 	
  voluminous	
  
-­‐ 	
  interconnected	
  
-­‐ 	
  evolving	
  
RecommandaCon	
  
Summzariz
aCon	
  
InformaCon	
  
retrieval	
  
MulCcriteria	
  
analysis	
  
Machine	
  
learning	
  
Graph	
  analysis	
  
Complex	
  data	
  
analysis	
  
Topological	
  
learning	
  
Text	
  mining	
  
Prac<cal	
  issue	
  
Approach	
  
Goal:	
  coping	
  with	
  complex	
  data	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
2	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
5
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
6
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Section	
  1	
  
The	
  big	
  picture	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
"   A	
  long	
  questioning	
  
"   Social	
  representation	
  through	
  the	
  media	
  
[Lippman,22]	
  [Moscovici,76]	
  [Newman	
  and	
  Block,06]	
  
"   Numeric	
  watch	
  on	
  the	
  Web	
  
[Chateauraynaud,03]	
  
8
Public	
  event	
  
From	
  facts	
  to	
  people:	
  the	
  essential	
  role	
  of	
  media	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
3	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Information	
  overload	
  
9
Image	
  credit:	
  Go-­‐Globe.com	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Data	
  journalism	
  
10
"   Crucial	
  need	
  to	
  catch	
  the	
  meaning	
  of	
  voluminous	
  data	
  provided	
  by	
  modern	
  
social	
  media,	
  in	
  order	
  to	
  design	
  new	
  search	
  engine	
  systems	
  
"   In	
  particular	
  (MSND	
  workshop@WWW’12)	
  
"   “How	
  to	
  surface	
  the	
  best	
  comments,	
  videos	
  and	
  pictures	
  from	
  a	
  variety	
  of	
  sources	
  in	
  
real	
  time	
  and	
  then	
  how	
  to	
  verify	
  them	
  ?”	
  
"   “How	
  to	
  quickly	
  surface	
  the	
  best	
  comments	
  and	
  work	
  out	
  which	
  ones	
  are	
  worth	
  
investigating	
  further	
  ?”	
  
"   “How	
  to	
  identify	
  quickly	
  the	
  key	
  influencers	
  on	
  any	
  particular	
  story,	
  so	
  they	
  can	
  get	
  
inside	
  information	
  or	
  interview	
  them	
  for	
  their	
  news	
  outlets	
  ?”	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Salvaged	
  by	
  (media)	
  curation?	
  
" Term	
  originated	
  from	
  Art,	
  appears	
  ~2011	
  
" Three-­‐step	
  process:	
  
" Aggregation:	
  gathering	
  
" Editorialize:	
  sorting,	
  categorizing,	
  
summarizing,	
  presenting…	
  
" Disseminate:	
  contextualizing,	
  sharing	
  
"   Important	
  role	
  of	
  the	
  curator	
  
"   Difference	
  between	
  “full	
  curation”	
  and	
  
automatic	
  edition	
  (e.g.,	
  paper.li)	
  
"   Many	
  platforms	
  (Scoop.it!,	
  Storify,	
  Storiful,	
  
Hopflow,	
  Stumbleupon,	
  Patch…):	
  
http://socialcompare.com/fr/comparison/curation-­‐
platforms-­‐amplify-­‐knowledge-­‐plaza-­‐storify	
  	
  
	
  
11
[Rosenbaum,11]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
A	
  case	
  study:	
  the	
  “HuffPost”	
  
12
"   Linked	
  with	
  social	
  networks	
  
"   Topically	
  indexed	
  
"   Available	
  on	
  various	
  devices	
  
"   Commented	
  news	
  
"   Community	
  of	
  bloggers	
  
"   Journalist	
  can	
  play	
  both	
  the	
  roles	
  of	
  
curator	
  and	
  community	
  manager	
  
	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
4	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
13
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Section	
  2	
  
Modeling	
  and	
  analyzing	
  
online	
  discussions	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Online	
  discussions	
  
"   Motivation:	
  
"   Numerous	
  available,	
  often	
  underused	
  data	
  
"   Crucial	
  to	
  feel	
  the	
  opinion	
  of	
  people	
  	
  
	
  
"   Contributions:	
  
"   Recommending	
  key	
  messages	
  [Stavrianou	
  et	
  al.,09,10]	
  
"   Extracting	
  the	
  latent	
  social	
  network	
  [Forestier	
  et	
  al.,11]	
  
"   Detecting	
  celebrities	
  from	
  online	
  forums	
  [Forestier	
  et	
  al.,12]	
  
"   Surfacing	
  roles	
  with	
  unsupervised	
  mechanisms	
  [Anukhin	
  et	
  al.,12]	
  
15
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
   16Julien Velcin - présentation ARC6 18 Octobre 2012
06/05/2013	
  
5	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Anatomy	
  of	
  an	
  online	
  discussion	
  
17
A	
  
B	
  
C	
  
A	
  
C	
  
B	
  
D D
A
B
C
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Recommending	
  key	
  messages	
  
"   “interesting”	
  message:	
  popular,	
  opinionated,	
  pioneer	
  etc.	
  
" Formalization	
  of	
  6	
  criteria	
  +	
  simple	
  aggregation	
  
" Comparison	
  to	
  manually-­‐labelled	
  data	
  on	
  8	
  french	
  forums	
  
" Results	
  for	
  a	
  priori	
  evaluation:	
  
"   F1-­‐Measure	
  ranges	
  from	
  0.2	
  to	
  0.3	
  for	
  a	
  single	
  criterion	
  
"   F1-­‐Measure	
  equals	
  0.48	
  for	
  aggregated	
  criteria	
  (simple	
  mean)	
  
" Results	
  for	
  a	
  posteriori	
  evaluations:	
  
18
1	
  
[Stavrianou	
  et	
  al.,09,10]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Extracting	
  the	
  (latent)	
  social	
  network	
  
"   Latent SN = reply-to links + name citation + text quotation
"   Name citation: bad spelling, compound names, abbreviations…
(what about “obama49”?)
"   Our solution: edit distance, soundex, PoS to detect nouns
"   Text quotation: cut-paste without quotation marks, rephrasing…
"   Our solution: string matching, locality principle (comparing close
messages), use quotation marks if provided
19
2	
  
[Forestier	
  et	
  al.,11]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Detecting	
  celebrities	
  
" Modeling the forum discussion with a graph G=(V,E)
" vertice v = forum participant
" edge e = link (implicit or explicit) between two participants
" Weighted in-degree of v: deg-(v)
" Weighted out-degree of v: deg+(v)
"   p(v) = set of messages posted by v
"   p~ = average of messages
" thr(v) = set of threads not initiated by v
20
3	
   [ForesCer	
  et	
  al.,12]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
6	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Detecting	
  celebrities	
  
"   Extracting social roles from a SN is a key issue
[Fisher et al.,06] [Himelboim et al.,09] [Forestier et al.,12]
"   Some examples of roles:
"   Leader: very participative user, who initiates discussion threads and
makes the animation
"   Expert: user particularly active in a restrictive number of topics
"   Celebrity: public person well known by the participants
" Flammer: user with a negative behavior, who can generate conflicts
"   Lurker: user who has a low participation in the discussion
"   In the following, we have chosen to focus on the explicit “celebrity”
role within online discussion forums
21
3	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Detecting	
  celebrities	
  
" Formalize the criteria given by [Golder and Donath,04]
22
3	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Detecting	
  celebrities	
  
"   Based on these atomic criteria, we define 3 meta-criteria:
"   meta-criterion 1: all the basic criteria must be satisfied (necessary
conditions), and we rank the interesting users in descending order
relative to the total number of posts
"   meta-criterion 2: id. but with a ranking depending on the user’s average
forum participation multiplied by the number of posts
"   meta-criterion 3: id. but taking into account name citation and text
quotation
"   Evaluation measure: compare the ranking of our meta-criteria with
the number of fans of each user (>800) = gold standard
"   Dataset:
"   57 forums from the US version of the Huffington Post
"   3 topics: politics, media, living
"   Overall 11,443 unique users and 35,175 posts
23
3	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
   24
[Forestier	
  et	
  al.,12]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
7	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Surfacing	
  roles	
  
"   New collaboration between and
"   Bottom-up “emerging” roles:
25
Axe Carrés 2 ter
4	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Surfacing	
  roles	
  
"   Discussions about 6 popular TV shows from TWOP forums
"   Parent-child relationship is restored using “quote” mechanism:
"   check previous 20 messages in the thread;
"   a parent has to contain at least 95% of the quoted text.
26
4	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Surfacing	
  roles	
  
" Profiling users using temporal-aware features:
" weighted in-degree,
" weighted out-degree,
" node in-g-index,
" node out-g-index,
" catalytic power,
" number of posts,
"   cross-topic entropy.
"   The role identification procedure is applied to the time series of
feature vectors of 1 263 forum users.
" Using moving time windows (size=1 week, shift=1 day)
27
4	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Surfacing	
  roles	
  
"   Clustering time series
"   Basic k-means algorithm
" Hartigan’s index used for estimating the best k
28
[Anokhin	
  et	
  al.,12]	
  
4	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
8	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Surfacing	
  roles	
  
" Some	
  observations:	
  
29
4	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
30
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Section	
  4	
  
Semi-­‐supervised	
  
clustering	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Temporal-­‐driven	
  clustering	
  
"   Goal:	
  detecting	
  typical	
  
patterns	
  over	
  time	
  
"   How	
  to	
  deal	
  with	
  temporally	
  
described	
  entities?	
  
"   Applications:	
  
"   Evolution	
  of	
  nation’s	
  political	
  
states	
  (proof	
  of	
  concept)	
  
"   Trajectories	
  over	
  roles	
  
"   Evolution	
  of	
  entities’	
  images	
  
(c.f.	
  ImagiWeb)	
  
32
φ2	
  
φ1	
  
t1	
  
t2	
  
t3	
  
t1	
  
t2	
  
t3	
  
x1
d	
  
x2
d	
  
x3
d	
  
x4
d	
  
x5
d	
  
x6
d	
  
t2	
   t3	
  t1	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
9	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Temporal-­‐driven	
  clustering	
  
" Detect	
  typical	
  evolution	
  patterns	
  of	
  
individuals	
  in	
  the	
  dataset:	
  
"   phases	
  through	
  which	
  the	
  entity	
  
collection	
  went	
  over	
  time	
  
" trajectory	
  of	
  entities	
  through	
  the	
  
different	
  phases	
  
33
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Temporal-­‐aware	
  constrained	
  clustering	
  
"   The	
  resulted	
  partition	
  must	
  ensure:	
  
"   descriptive	
  coherence	
  of	
  clusters;	
  
"   temporal	
  coherence	
  of	
  clusters;	
  
" continuous	
  segmentation	
  of	
  observations	
  	
  
belonging	
  to	
  an	
  entity	
  
"   Objective	
  function	
  to	
  minimize	
  (inspired	
  by	
  semi-­‐supervised	
  clustering	
  
clustering	
  [Wagstaff	
  and	
  Cardie,00])	
  +	
  use	
  of	
  K-­‐Means-­‐like	
  algorithm:	
  
34
Temporal-­‐aware	
  
dissimilarity	
  measure	
  
ConCguity	
  penalty	
  
measure	
  
(a)	
  
(b)	
  
(a)	
   (b)	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Experiments	
  on	
  political	
  dataset	
  
"   23	
  countries,	
  60	
  years	
  
"   207	
  political,	
  demographic,	
  social	
  and	
  economic	
  variables	
  
"   Running	
  TDCK-­‐Means	
  (8	
  clusters,	
  β	
  =	
  0.003	
  and	
  δ	
  =	
  3)	
  
35
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Experiments	
  on	
  political	
  dataset	
  
36
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
10	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Experiments	
  on	
  political	
  dataset	
  
37
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Experiments	
  on	
  political	
  dataset	
  
38
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
39
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
Section	
  5	
  
Focus	
  on	
  Project	
  
ImagiWeb	
  
hGp://eric.univ-­‐lyon2.fr/~jvelcin/imagiweb	
  
06/05/2013	
  
11	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Project	
  ImagiWeb	
  
"   Goal	
  of	
  Project	
  ANR	
  ImagiWeb:	
  analyzing	
  the	
  life	
  cycle	
  (production,	
  diffusion,	
  
evolution)	
  of	
  images	
  through	
  the	
  Web	
  2.0	
  
" Strong	
  points:	
  
"   Joint	
  analysis	
  of	
  opinions,	
  topics,	
  social	
  networks…	
  
" Involvement	
  of	
  (true)	
  researchers	
  in	
  LLSSH	
  
" Partners:	
  
"   ERIC:	
  data	
  mining,	
  machine	
  learning	
  
"   LIA:	
  text/opinion	
  mining,	
  information	
  retrieval	
  
"   CEPEL:	
  social	
  scientists,	
  specialist	
  in	
  politics	
  study	
  
"   XRCE:	
  information	
  extraction,	
  NLP	
  
"   AMI	
  Soft.:	
  numeric	
  watch	
  
"   EDF	
  R&D:	
  end-­‐user,	
  semiology	
  study	
  
41
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Project	
  ImagiWeb	
  
42
!"#$%&'
("$)*"+,$)&'
)-.'/')"$*)0*1&)&2'
3455)&'0461#7,)&'
(5+8)'
%51&)'
(5+8)'
0)*9,)'
(5+8)'
0)*9,)'
(5+8)'
0)*9,)'
:455)"$+1*)&'
;%<1+&'<)'
=455,"1=+#4"'
>&1$)&'?)@2'06+7,)A)2')$=.B'
C"+6D&)'<)&'<4""%)&'
<E)-0*)&&14"'
C"+6D&)'<)&'
040,6+#4"&'
F))<@+=G' (;CH(I!J'
%5)A),*&'
%5)A),*&'
*%=)0$),*&'
*%=)0$),*&'
*%=)0$),*&'
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Platform	
  for	
  performing	
  the	
  annotation	
  
"   Web	
  applications	
  designed	
  for	
  annotating	
  ~10k	
  tweets	
  +	
  200	
  blog	
  comments;	
  22	
  
annotators	
  are	
  working	
  on	
  it	
  right	
  now!	
  
"   Output:	
  (mφ	
  ;	
  mt;	
  mp	
  ;	
  ma	
  ;	
  mt	
  ;	
  ms	
  )	
  
43
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Platform	
  for	
  performing	
  the	
  annotation	
  
"   Web	
  applications	
  designed	
  for	
  annotating	
  ~10k	
  tweets	
  +	
  200	
  blog	
  comments;	
  22	
  
annotators	
  are	
  working	
  on	
  it	
  right	
  now!	
  
"   Output:	
  (mφ	
  ;	
  mt;	
  mp	
  ;	
  ma	
  ;	
  mt	
  ;	
  ms	
  )	
  
44
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
12	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Catching	
  image’s	
  evolution	
  over	
  time	
  
"   Input:	
  set	
  of	
  tuples	
  (mφ	
  ;	
  mt;	
  mp	
  ;	
  ma	
  ;	
  mt	
  ;	
  ms	
  )	
  
"   Some	
  good	
  questions:	
  
"   What	
  is	
  an	
  image?	
  
"   How	
  to	
  sum	
  up	
  the	
  bunch	
  of	
  (temporally-­‐situated	
  and	
  spatially-­‐located)	
  opinions?	
  
"   First	
  insight:	
  investigating	
  time	
  series	
  analysis,	
  temporally-­‐driven	
  clustering,	
  
graphical	
  models…	
  
"   Fortunately	
  we’ll	
  have	
  a	
  fulltime	
  post-­‐doc	
  student	
  to	
  work	
  on	
  it!	
  
45
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Recent	
  work	
  on	
  opinion	
  mining	
  
"   Participation	
  to	
  Sem-­‐Eval	
  2013	
  
" Task	
  2.B:	
  Discriminating	
  positive	
  (+)	
  from	
  negative	
  (-­‐)	
  
opinions	
  (+	
  neutral)	
  
" Very	
  recent	
  work:	
  improving	
  basic	
  NB	
  by	
  using	
  
background	
  knowledge	
  (seed	
  lists)	
  
"   6/35	
  and	
  3/16	
  on	
  the	
  official	
  tweet	
  dataset!	
  
" Results	
  on	
  our	
  own	
  datasets:	
  
46
[paper	
  just	
  submiGed]	
  
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   ½	
  -­‐sup.	
  clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Outline	
  
"  The	
  big	
  picture	
  
"  Modeling	
  and	
  analyzing	
  online	
  discussions	
  
"  Semi-­‐supervised	
  clustering	
  
"  Focus	
  on	
  Project	
  ImagiWeb	
  
"  Future	
  lines	
  of	
  research	
  
47
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   Topics	
   Clustering	
   ImagiWeb	
   Conclusion	
  
Section	
  6	
  
Future	
  lines	
  of	
  
research	
  
06/05/2013	
  
13	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
An	
  integrated	
  view	
  
	
   	
   	
   	
   	
  	
  	
  	
  	
  	
  Research	
  +	
  tools	
  +	
  applications	
  
"   Ongoing	
  Research	
  
"   Structured	
  temporal-­‐driven	
  clustering	
  (M.	
  A.	
  Rizoiu,	
  PhD	
  student)	
  
"   Bridging	
  the	
  gap	
  between	
  topics	
  and	
  concepts	
  (M.	
  A.	
  Rizoiu,	
  PhD	
  student)	
  
"   Multi-­‐document	
  summarization	
  of	
  online	
  discussions	
  (C.	
  Cercel,	
  PhD	
  student,	
  in	
  
collaboration	
  with	
  the	
  Polytechnic	
  Institute	
  of	
  Bucharest)	
  
"   Bottom-­‐up,	
  dynamic	
  extraction	
  of	
  roles	
  (A.	
  Lumbreras,	
  PhD	
  students,	
  in	
  
collaboration	
  with	
  Technicolor)	
  
"   Dynamic	
  joint	
  extraction	
  of	
  topics	
  and	
  opinions	
  (M.	
  Dermouche,	
  PhD	
  student,	
  in	
  
collaboration	
  with	
  AMI	
  Software)	
  
"   Extracting	
  opinionated	
  images	
  from	
  tweets	
  and	
  blogs	
  in	
  an	
  unsupervised	
  way	
  (Y.	
  
Kim,	
  post-­‐doc	
  student,	
  in	
  collaboration	
  with	
  LIA)	
  
49
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   Topics	
   Clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
An	
  integrated	
  view	
  
"   Tools	
  
" MediaMining:	
  a	
  full	
  open-­‐access	
  platform	
  for	
  analyzing	
  online	
  discussions	
  
"   Applications	
  
"   Reputation	
  Management	
  services	
  
	
  =>	
  Project	
  ImagiWeb,	
  with	
  specialist	
  in	
  political	
  studies	
  (2012-­‐2015,	
  ~860k)	
  
"   Discourse	
  analysis	
  in	
  public	
  opinion	
  
	
  =>	
  Project	
  DANuM,	
  with	
  linguists	
  (2013-­‐2014,	
  23k)	
  
	
   	
  =>	
  Project	
  ALICE,	
  with	
  social	
  scientists	
  and	
  specialists	
  in	
  communication	
  
	
  (just-­‐submitted)	
  
" The	
  next	
  step:	
  datamining-­‐based	
  services	
  for	
  “curation	
  support”,	
  with	
  specialist	
  in	
  
communication	
  and	
  journalists	
  
50
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   Topics	
   Clustering	
   ImagiWeb	
   Conclusion	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
Focus	
  on	
  the	
  collaboration	
  DAL/Lyon	
  
"   3	
  possible	
  scientific	
  contributions:	
  
" Labeling	
  hierarchical	
  topic	
  models	
  
" Labeling	
  dynamic	
  topic	
  models	
  
" Visualization	
  of	
  hierarchical/dynamic	
  topic	
  models	
  
51
ArCficial	
  
Neuronal	
  
Network	
  
Neuroscience	
  
OpCmizaCon	
  
Efficiency	
  
(staCsCcs)	
  
Learning	
  
theory	
  
Vision	
  
chip	
  GeneraCve	
  
model	
  
Graphical	
  
models	
  
Neural	
  
networks	
  
Background	
  
Computer	
  
vision	
  
Markov	
  
decision	
  
process	
  
ComputaCon
al	
  complexity	
  
theory	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
References	
  (excerpt)	
  
" Anokhin	
  N.,	
  J.	
  Lanagan,	
  J.	
  Velcin	
  (2012),	
  Social	
  Citation:	
  Finding	
  Roles	
  in	
  Social	
  Networks.	
  An	
  
Analysis	
  of	
  TV-­‐Series	
  Web	
  Forums.	
  Second	
  International	
  Workshop	
  on	
  Mining	
  Communities	
  and	
  
People	
  Recommenders	
  (COMMPER),	
  in	
  conjunction	
  with	
  ECML/PKDD,	
  Bristol,	
  UK.	
  
" Dermouche	
  M.,	
  J.	
  Velcin,	
  S.	
  Loudcher,	
  L.	
  Khouas	
  (2013),	
  Une	
  nouvelle	
  mesure	
  pour	
  l'évaluation	
  
des	
  méthodes	
  d'extraction	
  de	
  thématiques	
  :	
  la	
  Vraisemblance	
  Généralisée.	
  Actes	
  de	
  la	
  13ème	
  
Conférence	
  Francophone	
  sur	
  l'Extraction	
  et	
  la	
  Gestion	
  des	
  Connaissances	
  (EGC).	
  Toulouse,	
  
France.	
  
"   Forestier,	
  M.,	
  Stavrianou,	
  A.,	
  Velcin,	
  J.	
  and	
  Zighed,	
  D.A.	
  (2012),	
  Roles	
  in	
  Social	
  Networks:	
  
Methodologies	
  and	
  Research	
  Issues.	
  Web	
  Intelligence	
  and	
  Agent	
  Systems:	
  An	
  International	
  
Journal	
  (WIAS).	
  
" Musat,	
  C.,	
  Velcin,	
  J.,	
  Rizoiu,	
  M.A.	
  and	
  Trausan-­‐Matu,	
  S.	
  (2011),	
  Improving	
  Topic	
  Evaluation	
  
Using	
  Conceptual	
  Knowledge.	
  Proceedings	
  of	
  the	
  22nd	
  International	
  Joint	
  Conference	
  on	
  
Artificial	
  Intelligence	
  (IJCAI).	
  Barcelona,	
  Spain.	
  
" Rizoiu	
  M.A.,	
  J.	
  Velcin,	
  S.	
  Lallich	
  (2012),	
  Structuring	
  typical	
  evolutions	
  using	
  Temporal-­‐Driven	
  
Constrained	
  Clustering.	
  Proceedings	
  of	
  the	
  24th	
  IEEE	
  Internatinal	
  Conference	
  on	
  Tools	
  with	
  
Artificial	
  Intelligence	
  (ICTAI).	
  Athens,	
  Greece.	
  Best	
  student	
  paper	
  award.	
  
" Stavrianou,	
  A.,	
  Velcin,	
  J.	
  and	
  Chauchat,	
  J.H.	
  (2009),	
  A	
  combination	
  of	
  opinion	
  mining	
  and	
  social	
  
network	
  techniques	
  for	
  discussion	
  analysis.	
  Revue	
  des	
  Nouvelles	
  Technologies	
  de	
  l'Information	
  
(RNTI),	
  Cepadues.	
  
52
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   Topics	
   Clustering	
   ImagiWeb	
   Conclusion	
  
06/05/2013	
  
14	
  
eminar	
  at	
   housie	
  University	
  –	
  4/18/2013	
  –	
  ulien	
   elcin	
  
	
  
	
  
	
  
Thank	
  you!	
  
53
Context	
   The	
  big	
  picture	
   Online	
  discussions	
   Topics	
   Clustering	
   ImagiWeb	
   Conclusion	
  

More Related Content

Viewers also liked

Data mining based social network
Data mining based social networkData mining based social network
Data mining based social networkFiras Husseini
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisInfini Graph
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for socialFiras Husseini
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social mediarangesharp
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 

Viewers also liked (8)

Data mining based social network
Data mining based social networkData mining based social network
Data mining based social network
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & Analysis
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 

Similar to Data mining for analyzing social media

A brave new world - Student surveillance in higher education - Revisited
A brave new world - Student surveillance in higher education - RevisitedA brave new world - Student surveillance in higher education - Revisited
A brave new world - Student surveillance in higher education - RevisitedUniversity of South Africa (Unisa)
 
Technical Challenges for Realizing Learning Analytics
Technical Challenges for Realizing Learning AnalyticsTechnical Challenges for Realizing Learning Analytics
Technical Challenges for Realizing Learning AnalyticsRalf Klamma
 
Learning Informatics: AI • Analytics • Accountability • Agency
Learning Informatics: AI • Analytics • Accountability • AgencyLearning Informatics: AI • Analytics • Accountability • Agency
Learning Informatics: AI • Analytics • Accountability • AgencySimon Buckingham Shum
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionAlessandro Bozzon
 
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfCURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfMagdaLo1
 
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Arniel Ping
 
(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdfClaesTrinio
 
Robust Expert Finding in Web-Based Community Information Systems
Robust Expert Finding in Web-Based Community Information SystemsRobust Expert Finding in Web-Based Community Information Systems
Robust Expert Finding in Web-Based Community Information SystemsRalf Klamma
 
Open Learning Analytics LSAC2018
Open Learning Analytics LSAC2018Open Learning Analytics LSAC2018
Open Learning Analytics LSAC2018Ian Dolphin
 
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...Michael Edson
 
9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptxMagdaLo1
 
Ii 05. wp8 dti lodz.poland june 2014
Ii 05. wp8 dti lodz.poland june 2014Ii 05. wp8 dti lodz.poland june 2014
Ii 05. wp8 dti lodz.poland june 2014Maciej Szczepańczyk
 
10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptxEdelmarBenosa3
 
Social scholarship
Social scholarshipSocial scholarship
Social scholarshipneen01
 
Open Science - overview part 2: open data
Open Science - overview part 2: open dataOpen Science - overview part 2: open data
Open Science - overview part 2: open dataElena Giglia
 
Navigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with LinkuriousNavigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with LinkuriousLinkurious
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentWilliam Gunn
 

Similar to Data mining for analyzing social media (20)

A brave new world - Student surveillance in higher education - Revisited
A brave new world - Student surveillance in higher education - RevisitedA brave new world - Student surveillance in higher education - Revisited
A brave new world - Student surveillance in higher education - Revisited
 
Technical Challenges for Realizing Learning Analytics
Technical Challenges for Realizing Learning AnalyticsTechnical Challenges for Realizing Learning Analytics
Technical Challenges for Realizing Learning Analytics
 
Learning Informatics: AI • Analytics • Accountability • Agency
Learning Informatics: AI • Analytics • Accountability • AgencyLearning Informatics: AI • Analytics • Accountability • Agency
Learning Informatics: AI • Analytics • Accountability • Agency
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo Session
 
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfCURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
 
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
 
(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf
 
Robust Expert Finding in Web-Based Community Information Systems
Robust Expert Finding in Web-Based Community Information SystemsRobust Expert Finding in Web-Based Community Information Systems
Robust Expert Finding in Web-Based Community Information Systems
 
Open Learning Analytics LSAC2018
Open Learning Analytics LSAC2018Open Learning Analytics LSAC2018
Open Learning Analytics LSAC2018
 
Alma Swan: The Open Access advantage
Alma Swan: The Open Access advantageAlma Swan: The Open Access advantage
Alma Swan: The Open Access advantage
 
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...
Michael Edson @ MCN '09: Smithsonian Web and New Media Strategy -- Drivers, P...
 
9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx
 
Ii 05. wp8 dti lodz.poland june 2014
Ii 05. wp8 dti lodz.poland june 2014Ii 05. wp8 dti lodz.poland june 2014
Ii 05. wp8 dti lodz.poland june 2014
 
10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx
 
Social scholarship
Social scholarshipSocial scholarship
Social scholarship
 
Open Science - overview part 2: open data
Open Science - overview part 2: open dataOpen Science - overview part 2: open data
Open Science - overview part 2: open data
 
Navigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with LinkuriousNavigating large graphs like a breeze with Linkurious
Navigating large graphs like a breeze with Linkurious
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research Assessment
 
Entradas & Dimopoulos: Public science on the web
Entradas & Dimopoulos: Public science on the webEntradas & Dimopoulos: Public science on the web
Entradas & Dimopoulos: Public science on the web
 
Cook social network innovation
Cook   social network innovationCook   social network innovation
Cook social network innovation
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 

Data mining for analyzing social media

  • 1. 06/05/2013   1   Data  mining  for   analyzing  the   social  media   Social   Networks   Video/picture   sharing   Opinions   News  websites   Blogs   Knowledge   sharing  Microblogging   eminar  at                              4/18/2013   PresentaCon:  J.  Velcin   hGp://mediamining.univ-­‐lyon2.fr/people/velcin   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Ecosystem  of  ERIC  Lab   2 Axe Carrés 2 ter BSc  &  MSc  degrees   BI,  data  mining,  staCsCcs  2  teams:  SID  &  DMD   Academics   Companies   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Lyon   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Research  landscape   3 Data   Data-­‐ warehouse   Knowledge   ETL   Online  analysis   Data  mining   D e c i s i o n   Complex  data   integraCon   MulCdimensional   modeling   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Data  Mining  &   Decision  (DMD)   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Data  Mining  &  Decision  (DMD)   4 Social   Networks   Microblogging     Video/picture   sharing   Opinion  sharing   News  websites   Blogs   Knowledge   sharing   e.g.  Social  Media   -­‐   heterogeneous   -­‐   voluminous   -­‐   interconnected   -­‐   evolving   RecommandaCon   Summzariz aCon   InformaCon   retrieval   MulCcriteria   analysis   Machine   learning   Graph  analysis   Complex  data   analysis   Topological   learning   Text  mining   Prac<cal  issue   Approach   Goal:  coping  with  complex  data   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 2. 06/05/2013   2   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline     "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   5 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline   "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   6 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Section  1   The  big  picture   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   "   A  long  questioning   "   Social  representation  through  the  media   [Lippman,22]  [Moscovici,76]  [Newman  and  Block,06]   "   Numeric  watch  on  the  Web   [Chateauraynaud,03]   8 Public  event   From  facts  to  people:  the  essential  role  of  media   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 3. 06/05/2013   3   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Information  overload   9 Image  credit:  Go-­‐Globe.com   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Data  journalism   10 "   Crucial  need  to  catch  the  meaning  of  voluminous  data  provided  by  modern   social  media,  in  order  to  design  new  search  engine  systems   "   In  particular  (MSND  workshop@WWW’12)   "   “How  to  surface  the  best  comments,  videos  and  pictures  from  a  variety  of  sources  in   real  time  and  then  how  to  verify  them  ?”   "   “How  to  quickly  surface  the  best  comments  and  work  out  which  ones  are  worth   investigating  further  ?”   "   “How  to  identify  quickly  the  key  influencers  on  any  particular  story,  so  they  can  get   inside  information  or  interview  them  for  their  news  outlets  ?”   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Salvaged  by  (media)  curation?   " Term  originated  from  Art,  appears  ~2011   " Three-­‐step  process:   " Aggregation:  gathering   " Editorialize:  sorting,  categorizing,   summarizing,  presenting…   " Disseminate:  contextualizing,  sharing   "   Important  role  of  the  curator   "   Difference  between  “full  curation”  and   automatic  edition  (e.g.,  paper.li)   "   Many  platforms  (Scoop.it!,  Storify,  Storiful,   Hopflow,  Stumbleupon,  Patch…):   http://socialcompare.com/fr/comparison/curation-­‐ platforms-­‐amplify-­‐knowledge-­‐plaza-­‐storify       11 [Rosenbaum,11]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   A  case  study:  the  “HuffPost”   12 "   Linked  with  social  networks   "   Topically  indexed   "   Available  on  various  devices   "   Commented  news   "   Community  of  bloggers   "   Journalist  can  play  both  the  roles  of   curator  and  community  manager     Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 4. 06/05/2013   4   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline   "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   13 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Section  2   Modeling  and  analyzing   online  discussions   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Online  discussions   "   Motivation:   "   Numerous  available,  often  underused  data   "   Crucial  to  feel  the  opinion  of  people       "   Contributions:   "   Recommending  key  messages  [Stavrianou  et  al.,09,10]   "   Extracting  the  latent  social  network  [Forestier  et  al.,11]   "   Detecting  celebrities  from  online  forums  [Forestier  et  al.,12]   "   Surfacing  roles  with  unsupervised  mechanisms  [Anukhin  et  al.,12]   15 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   16Julien Velcin - présentation ARC6 18 Octobre 2012
  • 5. 06/05/2013   5   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Anatomy  of  an  online  discussion   17 A   B   C   A   C   B   D D A B C Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Recommending  key  messages   "   “interesting”  message:  popular,  opinionated,  pioneer  etc.   " Formalization  of  6  criteria  +  simple  aggregation   " Comparison  to  manually-­‐labelled  data  on  8  french  forums   " Results  for  a  priori  evaluation:   "   F1-­‐Measure  ranges  from  0.2  to  0.3  for  a  single  criterion   "   F1-­‐Measure  equals  0.48  for  aggregated  criteria  (simple  mean)   " Results  for  a  posteriori  evaluations:   18 1   [Stavrianou  et  al.,09,10]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Extracting  the  (latent)  social  network   "   Latent SN = reply-to links + name citation + text quotation "   Name citation: bad spelling, compound names, abbreviations… (what about “obama49”?) "   Our solution: edit distance, soundex, PoS to detect nouns "   Text quotation: cut-paste without quotation marks, rephrasing… "   Our solution: string matching, locality principle (comparing close messages), use quotation marks if provided 19 2   [Forestier  et  al.,11]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Detecting  celebrities   " Modeling the forum discussion with a graph G=(V,E) " vertice v = forum participant " edge e = link (implicit or explicit) between two participants " Weighted in-degree of v: deg-(v) " Weighted out-degree of v: deg+(v) "   p(v) = set of messages posted by v "   p~ = average of messages " thr(v) = set of threads not initiated by v 20 3   [ForesCer  et  al.,12]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 6. 06/05/2013   6   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Detecting  celebrities   "   Extracting social roles from a SN is a key issue [Fisher et al.,06] [Himelboim et al.,09] [Forestier et al.,12] "   Some examples of roles: "   Leader: very participative user, who initiates discussion threads and makes the animation "   Expert: user particularly active in a restrictive number of topics "   Celebrity: public person well known by the participants " Flammer: user with a negative behavior, who can generate conflicts "   Lurker: user who has a low participation in the discussion "   In the following, we have chosen to focus on the explicit “celebrity” role within online discussion forums 21 3   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Detecting  celebrities   " Formalize the criteria given by [Golder and Donath,04] 22 3   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Detecting  celebrities   "   Based on these atomic criteria, we define 3 meta-criteria: "   meta-criterion 1: all the basic criteria must be satisfied (necessary conditions), and we rank the interesting users in descending order relative to the total number of posts "   meta-criterion 2: id. but with a ranking depending on the user’s average forum participation multiplied by the number of posts "   meta-criterion 3: id. but taking into account name citation and text quotation "   Evaluation measure: compare the ranking of our meta-criteria with the number of fans of each user (>800) = gold standard "   Dataset: "   57 forums from the US version of the Huffington Post "   3 topics: politics, media, living "   Overall 11,443 unique users and 35,175 posts 23 3   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   24 [Forestier  et  al.,12]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 7. 06/05/2013   7   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Surfacing  roles   "   New collaboration between and "   Bottom-up “emerging” roles: 25 Axe Carrés 2 ter 4   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Surfacing  roles   "   Discussions about 6 popular TV shows from TWOP forums "   Parent-child relationship is restored using “quote” mechanism: "   check previous 20 messages in the thread; "   a parent has to contain at least 95% of the quoted text. 26 4   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Surfacing  roles   " Profiling users using temporal-aware features: " weighted in-degree, " weighted out-degree, " node in-g-index, " node out-g-index, " catalytic power, " number of posts, "   cross-topic entropy. "   The role identification procedure is applied to the time series of feature vectors of 1 263 forum users. " Using moving time windows (size=1 week, shift=1 day) 27 4   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Surfacing  roles   "   Clustering time series "   Basic k-means algorithm " Hartigan’s index used for estimating the best k 28 [Anokhin  et  al.,12]   4   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 8. 06/05/2013   8   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Surfacing  roles   " Some  observations:   29 4   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline   "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   30 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Section  4   Semi-­‐supervised   clustering   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Temporal-­‐driven  clustering   "   Goal:  detecting  typical   patterns  over  time   "   How  to  deal  with  temporally   described  entities?   "   Applications:   "   Evolution  of  nation’s  political   states  (proof  of  concept)   "   Trajectories  over  roles   "   Evolution  of  entities’  images   (c.f.  ImagiWeb)   32 φ2   φ1   t1   t2   t3   t1   t2   t3   x1 d   x2 d   x3 d   x4 d   x5 d   x6 d   t2   t3  t1   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 9. 06/05/2013   9   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Temporal-­‐driven  clustering   " Detect  typical  evolution  patterns  of   individuals  in  the  dataset:   "   phases  through  which  the  entity   collection  went  over  time   " trajectory  of  entities  through  the   different  phases   33 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Temporal-­‐aware  constrained  clustering   "   The  resulted  partition  must  ensure:   "   descriptive  coherence  of  clusters;   "   temporal  coherence  of  clusters;   " continuous  segmentation  of  observations     belonging  to  an  entity   "   Objective  function  to  minimize  (inspired  by  semi-­‐supervised  clustering   clustering  [Wagstaff  and  Cardie,00])  +  use  of  K-­‐Means-­‐like  algorithm:   34 Temporal-­‐aware   dissimilarity  measure   ConCguity  penalty   measure   (a)   (b)   (a)   (b)   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Experiments  on  political  dataset   "   23  countries,  60  years   "   207  political,  demographic,  social  and  economic  variables   "   Running  TDCK-­‐Means  (8  clusters,  β  =  0.003  and  δ  =  3)   35 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Experiments  on  political  dataset   36 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 10. 06/05/2013   10   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Experiments  on  political  dataset   37 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Experiments  on  political  dataset   38 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline   "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   39 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   Section  5   Focus  on  Project   ImagiWeb   hGp://eric.univ-­‐lyon2.fr/~jvelcin/imagiweb  
  • 11. 06/05/2013   11   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Project  ImagiWeb   "   Goal  of  Project  ANR  ImagiWeb:  analyzing  the  life  cycle  (production,  diffusion,   evolution)  of  images  through  the  Web  2.0   " Strong  points:   "   Joint  analysis  of  opinions,  topics,  social  networks…   " Involvement  of  (true)  researchers  in  LLSSH   " Partners:   "   ERIC:  data  mining,  machine  learning   "   LIA:  text/opinion  mining,  information  retrieval   "   CEPEL:  social  scientists,  specialist  in  politics  study   "   XRCE:  information  extraction,  NLP   "   AMI  Soft.:  numeric  watch   "   EDF  R&D:  end-­‐user,  semiology  study   41 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Project  ImagiWeb   42 !"#$%&' ("$)*"+,$)&' )-.'/')"$*)0*1&)&2' 3455)&'0461#7,)&' (5+8)' %51&)' (5+8)' 0)*9,)' (5+8)' 0)*9,)' (5+8)' 0)*9,)' :455)"$+1*)&' ;%<1+&'<)' =455,"1=+#4"' >&1$)&'?)@2'06+7,)A)2')$=.B' C"+6D&)'<)&'<4""%)&' <E)-0*)&&14"' C"+6D&)'<)&' 040,6+#4"&' F))<@+=G' (;CH(I!J' %5)A),*&' %5)A),*&' *%=)0$),*&' *%=)0$),*&' *%=)0$),*&' Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Platform  for  performing  the  annotation   "   Web  applications  designed  for  annotating  ~10k  tweets  +  200  blog  comments;  22   annotators  are  working  on  it  right  now!   "   Output:  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )   43 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Platform  for  performing  the  annotation   "   Web  applications  designed  for  annotating  ~10k  tweets  +  200  blog  comments;  22   annotators  are  working  on  it  right  now!   "   Output:  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )   44 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  
  • 12. 06/05/2013   12   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Catching  image’s  evolution  over  time   "   Input:  set  of  tuples  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )   "   Some  good  questions:   "   What  is  an  image?   "   How  to  sum  up  the  bunch  of  (temporally-­‐situated  and  spatially-­‐located)  opinions?   "   First  insight:  investigating  time  series  analysis,  temporally-­‐driven  clustering,   graphical  models…   "   Fortunately  we’ll  have  a  fulltime  post-­‐doc  student  to  work  on  it!   45 Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Recent  work  on  opinion  mining   "   Participation  to  Sem-­‐Eval  2013   " Task  2.B:  Discriminating  positive  (+)  from  negative  (-­‐)   opinions  (+  neutral)   " Very  recent  work:  improving  basic  NB  by  using   background  knowledge  (seed  lists)   "   6/35  and  3/16  on  the  official  tweet  dataset!   " Results  on  our  own  datasets:   46 [paper  just  submiGed]   Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Outline   "  The  big  picture   "  Modeling  and  analyzing  online  discussions   "  Semi-­‐supervised  clustering   "  Focus  on  Project  ImagiWeb   "  Future  lines  of  research   47 Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion   Section  6   Future  lines  of   research  
  • 13. 06/05/2013   13   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   An  integrated  view                      Research  +  tools  +  applications   "   Ongoing  Research   "   Structured  temporal-­‐driven  clustering  (M.  A.  Rizoiu,  PhD  student)   "   Bridging  the  gap  between  topics  and  concepts  (M.  A.  Rizoiu,  PhD  student)   "   Multi-­‐document  summarization  of  online  discussions  (C.  Cercel,  PhD  student,  in   collaboration  with  the  Polytechnic  Institute  of  Bucharest)   "   Bottom-­‐up,  dynamic  extraction  of  roles  (A.  Lumbreras,  PhD  students,  in   collaboration  with  Technicolor)   "   Dynamic  joint  extraction  of  topics  and  opinions  (M.  Dermouche,  PhD  student,  in   collaboration  with  AMI  Software)   "   Extracting  opinionated  images  from  tweets  and  blogs  in  an  unsupervised  way  (Y.   Kim,  post-­‐doc  student,  in  collaboration  with  LIA)   49 Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   An  integrated  view   "   Tools   " MediaMining:  a  full  open-­‐access  platform  for  analyzing  online  discussions   "   Applications   "   Reputation  Management  services    =>  Project  ImagiWeb,  with  specialist  in  political  studies  (2012-­‐2015,  ~860k)   "   Discourse  analysis  in  public  opinion    =>  Project  DANuM,  with  linguists  (2013-­‐2014,  23k)      =>  Project  ALICE,  with  social  scientists  and  specialists  in  communication    (just-­‐submitted)   " The  next  step:  datamining-­‐based  services  for  “curation  support”,  with  specialist  in   communication  and  journalists   50 Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   Focus  on  the  collaboration  DAL/Lyon   "   3  possible  scientific  contributions:   " Labeling  hierarchical  topic  models   " Labeling  dynamic  topic  models   " Visualization  of  hierarchical/dynamic  topic  models   51 ArCficial   Neuronal   Network   Neuroscience   OpCmizaCon   Efficiency   (staCsCcs)   Learning   theory   Vision   chip  GeneraCve   model   Graphical   models   Neural   networks   Background   Computer   vision   Markov   decision   process   ComputaCon al  complexity   theory   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   References  (excerpt)   " Anokhin  N.,  J.  Lanagan,  J.  Velcin  (2012),  Social  Citation:  Finding  Roles  in  Social  Networks.  An   Analysis  of  TV-­‐Series  Web  Forums.  Second  International  Workshop  on  Mining  Communities  and   People  Recommenders  (COMMPER),  in  conjunction  with  ECML/PKDD,  Bristol,  UK.   " Dermouche  M.,  J.  Velcin,  S.  Loudcher,  L.  Khouas  (2013),  Une  nouvelle  mesure  pour  l'évaluation   des  méthodes  d'extraction  de  thématiques  :  la  Vraisemblance  Généralisée.  Actes  de  la  13ème   Conférence  Francophone  sur  l'Extraction  et  la  Gestion  des  Connaissances  (EGC).  Toulouse,   France.   "   Forestier,  M.,  Stavrianou,  A.,  Velcin,  J.  and  Zighed,  D.A.  (2012),  Roles  in  Social  Networks:   Methodologies  and  Research  Issues.  Web  Intelligence  and  Agent  Systems:  An  International   Journal  (WIAS).   " Musat,  C.,  Velcin,  J.,  Rizoiu,  M.A.  and  Trausan-­‐Matu,  S.  (2011),  Improving  Topic  Evaluation   Using  Conceptual  Knowledge.  Proceedings  of  the  22nd  International  Joint  Conference  on   Artificial  Intelligence  (IJCAI).  Barcelona,  Spain.   " Rizoiu  M.A.,  J.  Velcin,  S.  Lallich  (2012),  Structuring  typical  evolutions  using  Temporal-­‐Driven   Constrained  Clustering.  Proceedings  of  the  24th  IEEE  Internatinal  Conference  on  Tools  with   Artificial  Intelligence  (ICTAI).  Athens,  Greece.  Best  student  paper  award.   " Stavrianou,  A.,  Velcin,  J.  and  Chauchat,  J.H.  (2009),  A  combination  of  opinion  mining  and  social   network  techniques  for  discussion  analysis.  Revue  des  Nouvelles  Technologies  de  l'Information   (RNTI),  Cepadues.   52 Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion  
  • 14. 06/05/2013   14   eminar  at   housie  University  –  4/18/2013  –  ulien   elcin         Thank  you!   53 Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion