Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
New	
  Frontiers	
  of	
  Automated	
  Content	
  Analysis	
  in	
  the	
  Social	
  Sciences	
  (ACA)	
  —	
  July	
  1-­...
Minimising	
  the	
  usual	
  introduction
+ the	
  Internet	
  ‘revolution’	
  
+ successful	
  web	
  products	
  feedin...
Minimising	
  the	
  usual	
  introduction
+ the	
  Internet	
  ‘revolution’	
  
+ successful	
  web	
  products	
  feedin...
Overview	
  
A.	
  	
  	
  Prefixed	
  keyword-­‐based	
  mining	
  
B.	
  	
  	
  Automating	
  feature	
  selection	
  
...
Word	
  taxonomies	
  for	
  emotion
WordNet	
  Affect	
  
+ builds	
  on	
  WordNet	
  —	
  automated	
  word	
  selection...
Applying	
  emotion	
  taxonomies	
  	
  
on	
  Google	
  Books
Emotion	
  Score	
  =	
  Mean	
  of	
  normalised	
  emotion	
  term	
  frequencies	
  
Left:	
  	
  	
  	
  Joy	
  minus	...
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
2
4
6
8
10
12
14
16
18
Year
MiseryScore
LMt
EM(11)
t
Literary	
  Mi...
…and	
  now	
  let’s	
  apply	
  keyword	
  
based	
  sentiment	
  extraction	
  
tools	
  on	
  Twitter	
  content
Collective	
  mood	
  
patterns	
  (UK)	
  
Top:	
  
‘joy’	
  time	
  series	
  
across	
  3	
  years	
  
Bottom:	
  
rate...
Overview	
  
A.	
  	
  	
  Prefixed	
  keyword-­‐based	
  mining	
  
B.	
  	
  	
  Automating	
  feature	
  selection	
  
...
The	
  case	
  of	
  influenza-­‐like	
  illness	
  (ILI)
+ existence	
  of	
  ‘ground	
  truth’	
  enables	
  optimisatio...
The	
  case	
  of	
  influenza-­‐like	
  illness	
  (ILI)
+ existence	
  of	
  ‘ground	
  truth’	
  enables	
  optimisatio...
The	
  case	
  of	
  influenza-­‐like	
  illness	
  (ILI)
Twitter	
  data	
  
+ 27	
  million	
  tweets	
  from	
  54	
  U...
Regularised	
  text	
  regressionRegression basics — Lasso
• observations xxxi œ Rm, i œ {1, ..., n} — XXX
• responses yi ...
Manual	
  vs.	
  automated	
  feature	
  selection
180 200 220 240 260 280 300 320 340
0
20
40
60
80
100
120
140
160
Day N...
Robustifying	
  the	
  previous	
  algorithm
Lasso	
  may	
  not	
  select	
  the	
  true	
  model	
  due	
  to	
  colline...
Robustifying	
  the	
  previous	
  algorithm
Lasso	
  may	
  not	
  select	
  the	
  true	
  model	
  due	
  to	
  colline...
Word	
  cloud	
  —	
  Selected	
  n-­‐grams	
  for	
  ILI
So,	
  apart	
  from	
  flu,	
  
we	
  also	
  tried	
  to	
  nowcast	
  	
  
rainfall	
  rates.
Word	
  cloud	
  —	
  Selected	
  n-­‐grams	
  for	
  rain
Inference	
  
examples	
  
Top:	
  
ILI	
  rates	
  
Bottom:	
  
Rainfall	
  rates
0 5 10 15 20 25 30
0
20
40
60
80
100
12...
Overview	
  
A.	
  	
  	
  Prefixed	
  keyword-­‐based	
  mining	
  
B.	
  	
  	
  Automating	
  feature	
  selection	
  
...
User-­‐centric	
  modelling	
  :	
  why?
+ text	
   regression	
   models	
   usually	
   focus	
   on	
   the	
  
word	
 ...
Bilinear Text Regression — The general idea (1/2)
Linear regression: f (xxxi) = xxxT
i www + —
• observations xxxi œ Rm, i...
‘bilinear’	
  modelling	
  :	
  definition
ilinear Text Regression — The general idea (2/2)
• users p œ Z+
• observations Q...
Bilinear	
  regularised	
  regressionBilinear Text Regression — The general idea (2/2)
• users p œ Z+
• observations QQQi ...
An	
  extension:	
  bilinear	
  &	
  multi-­‐task
+ optimise	
   (learn	
   the	
   model	
   parameters	
   for)	
   a	
 ...
Bilinear	
  multi-­‐task	
  text	
  regressionBilinear Multi-Task Learning
• tasks · œ Z+
• users p œ Z+
• observations QQ...
Bilinear	
  Group	
  l2,1	
  (BGL)Bilinear Multi-Task Learning
• tasks · œ Z+
• users p œ Z+
• observations QQQi œ Rp◊m, i...
BGL’s	
  main	
  propertyargmin
UUU,WWW,———
I ·ÿ
t=1
nÿ
i=1
1
uuuT
t QQQiwwwt + —t ≠ yti
22
+ ⁄u
pÿ
k=1
ÎUUUkÎ2 + ⁄w
mÿ
j=...
Inferring	
  voting	
  
intention	
  via	
  Twitter	
  
Left	
  side:	
  
UK	
  —	
  3	
  parties,	
  42K	
  
users	
  (~	...
Root	
  Mean	
  Squared	
  Error
0
0.62
1.24
1.86
2.48
3.1
UK Austria
1.4391.478
1.699
1.573
1.442
3.067
1.47
1.723
1.851
...
BGL-­‐scored	
  tweet	
  examples	
  (Austria)
Party Tweet Score User	
  type
SPÖ
Inflation	
  rate	
  in	
  Austria	
  sli...
Overview	
  
A.	
  	
  	
  Prefixed	
  keyword-­‐based	
  mining	
  
B.	
  	
  	
  Automating	
  feature	
  selection	
  
...
Predicting	
  user	
  impact	
  on	
  Twitter
+ Validate	
  a	
  hypothesis:	
  “User	
  behaviour	
  on	
  a	
  social	
 ...
Defining	
  an	
  impact	
  score	
  (S)
ct — a simplified definition
S(„in, „out, „⁄) = ln
A
(„⁄ + ◊) („in + ◊)2
„out + ◊
B
...
Impact	
  prediction	
  as	
  a	
  regression	
  task
(Lampos,	
  Aletras,	
  Preotiuc-­‐Pietro	
  &	
  Cohn,	
  2014)
Pea...
Some	
  of	
  the	
  most	
  
important	
  user	
  
attributes	
  for	
  impact	
  
(excl.	
  topics)	
  
500	
  accounts	...
We	
  can	
  guess	
  the	
  impact	
  
of	
  user	
  from	
  user	
  activity,	
  	
  
but	
  can	
  we	
  infer	
  his	
...
Inferring	
  the	
  occupational	
  class	
  of	
  a	
  Twitter	
  user
“Socioeconomic	
  variables	
  are	
  influencing	
...
Standard	
  Occupational	
  Classification	
  (SOC,	
  2010)
C1	
  	
  	
  Managers,	
  Directors	
  &	
  Senior	
  Official...
Data
+ 5,191	
  users	
  mapped	
  to	
  their	
  occupations,	
  then	
  mapped	
  to	
  one	
  of	
  the	
  9	
  
SOC	
 ...
Features
User	
  attributes	
  (18)	
  
+ number	
  of	
  followers,	
  friends,	
  listings,	
  follower/friend	
  ratio,...
Accuracy	
  (%)
25
31
37
43
49
55
Feature	
  type
User	
  Attributes SVD-­‐200-­‐clusters Word2Vec-­‐200-­‐clusters
52.7
4...
Topics
Manual	
  label Most	
  central	
  words;	
  Most	
  frequent	
  words Rank
Arts archival,	
  stencil,	
  canvas,	
...
Discussion	
  topics	
  
per	
  occupational	
  
class	
  —	
  CDF	
  plots	
  
Plots	
  explained	
  
Topic	
  more	
  pr...
Left:	
  Distance	
  (Jensen-­‐Shannon	
  divergence)	
  between	
  topic	
  distributions	
  for	
  
the	
  different	
  o...
concluding…	
  
Extracting	
  interesting	
  concepts	
  from	
  large-­‐scale	
  textual	
  data
95%
Conclusions
Publicly	
  available,	
  user-­‐generated	
  content	
  can	
  be	
  used	
  to	
  
better	
  understand:	
  ...
Some	
  of	
  the	
  challenges	
  ahead
+ Work	
  closer	
  with	
  domain	
  experts	
  (social	
  scientists	
  to	
  
...
Collaborators	
  participating	
  in	
  the	
  work	
  presented	
  today
(in	
  alphabetical	
  order)	
  
Alberto	
  Ace...
Thank	
  you!	
  
slides	
  available	
  at	
  
http://www.lampos.net/sites/default/files/slides/ACA2015.pdf
Extracting	
 ...
Bonus	
  slides
Extracting	
  interesting	
  concepts	
  from	
  large-­‐scale	
  textual	
  data
100%
Training	
  Bilinear	
  Elastic	
  Net	
  (BEN)Bilinear Elastic Net (BEN)
argmin
uuu,www,—
I nÿ
i=1
1
uuuT
QQQiwww + — ≠ y...
Frequency
Word
Outlet
Polarity
Yes
Yes
+ -
Weight
a a
Bilinear	
  modelling	
  of	
  EU	
  unemployment	
  via	
  news	
  ...
More	
  information	
  about	
  Gaussian	
  Processes
+ non-­‐linear,	
  kernelised,	
  non-­‐parametric,	
  modular	
  
+...
Occupational	
  class	
  (9-­‐way)	
  classification	
  confusion	
  matrix
for a topic across these classifiers. Topic lab...
References
Acerbi,	
  Lampos,	
  Garnett	
  &	
  Bentley.	
  The	
  Expression	
  of	
  Emotions	
  in	
  20th	
  Century	...
Upcoming SlideShare
Loading in …5
×

Extracting interesting concepts from large-scale textual data

604 views

Published on

Over the past few years large-scale textual resources, such as news articles, social media, search queries or even digitised books, have been the centre of various research efforts. In particular, the open nature of the microblogging platform of Twitter provided the unique opportunity for various appealing ideas to be evaluated. Based on the hypothesis that this online stream of content should represent at least a biased fraction of real-world situations or opinions, we have proposed core algorithms for estimating the current rate of an infectious disease, such as influenza, or even of a natural, less stable phenomenon like rainfall rates. A simplified emotion analysis on a longitudinal set of tweets revealed interesting patterns, including signs of rising anger and fear before the UK riots in August, 2011. A similar analysis on the Google N-gram book database uncovered collective patterns of affect over the course of the 20th century. Through the extension of linear text regression approaches by the addition of a user dimension, we proposed a family of bilinear regularised regression models, which found application in the approximation of voting intention trends. Finally, we reversed the previous modelling principle in an attempt to qualitatively analyse user attributes or behaviours.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Extracting interesting concepts from large-scale textual data

  1. 1. New  Frontiers  of  Automated  Content  Analysis  in  the  Social  Sciences  (ACA)  —  July  1-­‐3,  2015 Extracting  interesting  concepts     from  large-­‐scale  textual  data Vasileios  Lampos,  University  College  London
  2. 2. Minimising  the  usual  introduction + the  Internet  ‘revolution’   + successful  web  products  feeding  from  user  activity   (search  engines,  social  networks)   + large  volumes  of  digitised  data  (‘Big  Data’)   + lots  of  user-­‐generated  text  &  activity  logs   Can  we  arrive  to  better  understandings  of  our   ‘world’  from  this  data?
  3. 3. Minimising  the  usual  introduction + the  Internet  ‘revolution’   + successful  web  products  feeding  from  user  activity   (search  engines,  social  networks)   + large  volumes  of  digitised  data  (‘Big  Data’)   + lots  of  user-­‐generated  text  &  activity  logs   Can  we  arrive  to  better  understandings  of  our   ‘world’  from  this  data?
  4. 4. Overview   A.      Prefixed  keyword-­‐based  mining   B.      Automating  feature  selection   C.      User-­‐centric  (bilinear)  modelling   D.      Inferring  user  characteristics Extracting  interesting  concepts  from  large-­‐scale  textual  data 100%5%
  5. 5. Word  taxonomies  for  emotion WordNet  Affect   + builds  on  WordNet  —  automated  word  selection   + anger,  disgust,  fear,  joy,  sadness,  surprise   (Strapparava  &  Valitutti,  2004)   Linguistic  Inquiry  and  Word  Count  (LIWC)   + taxonomies  have  been  evaluated  by  human  judges   + affect,  anger,  anxiety,  sadness,  negative  or  positive  emotions   (Pennebaker  et  al.,  2007)
  6. 6. Applying  emotion  taxonomies     on  Google  Books
  7. 7. Emotion  Score  =  Mean  of  normalised  emotion  term  frequencies   Left:        Joy  minus  Sadness  —  WWII,  Baby  Boom,  Great  Depression   Right:  Emotional  expression  in  English  books  decreases  over  the  years ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ●● 1900 1920 1940 1960 1980 2000 −1.0 −0.5 0.0 0.5 1.0 Year Joy−Sadness(z−scores) ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ●●●●● ●● ●● ●● ●●●● ● ●●● ● ●● ●● ●● ●●●● ●●●●● ●●●●●●● ●●● ●● ● ●● ● ● 1900 1920 1940 1960 1980 2000 −4 −2 0 2 4 Year Emotion−Random(z−scores) ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●●●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ●●●● ● ●●● ● ●● ● ●●● ●●●● ●● ●● ●●● ● ● ●● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ● ● ●● ● ● ● ● ●● ●●● ●● ● ● ●● ● ● ●●●● ●● ● ● ●● ●● ●●● ● ● All Fear Disgust (Acerbi,  Lampos,  Garnett  &  Bentley,  2013)
  8. 8. 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2 4 6 8 10 12 14 16 18 Year MiseryScore LMt EM(11) t Literary  Misery  (LM    =  Sadness  -­‐  Joy)  vs.     Economic  Misery  (EM,  10-­‐year  past-­‐moving  average)     for  books  written  in  American  English  and  US  financial  indices (Bentley,  Acerbi,  Ormerod  &  Lampos,  2014) EM  =  Inflation  +  Unemployment
  9. 9. …and  now  let’s  apply  keyword   based  sentiment  extraction   tools  on  Twitter  content
  10. 10. Collective  mood   patterns  (UK)   Top:   ‘joy’  time  series   across  3  years   Bottom:   rate  of  mood  change   for  ‘anger’  and   ‘fear’  (50-­‐day   window);  peaks   indicate  increase  in   mood  change   (Lansdall-­‐Welfare,   Lampos  &  Cristianini,   2012) Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 −1 −0.5 0 0.5 1 1.5 Rate of Mood Change by Day using the Difference in 50−day Mean Date Differenceinmean Anger Fear Date of Budget Cuts Date of Riots RiotsBudget  Cuts 27august2012 naly- mostly nt to mera, f the ss) is ed by meth- a test alysis. ethod plied rated weet- ation word- ords, They find there was a periodic peak of joy around selves, but this does not mean that we could Figure 2. Plot of the time series representing levels of joy estimator over a period of 2½ years. Notice the peaks corresponding to Christmas and New Year, Valentine’s day and the Royal Wedding Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 −2 0 2 4 6 8 10 933 Day Time Series for Joy in Twitter Content Date NormalisedEmotionalValence * RIOTS * CUTS * XMAS * XMAS * XMAS * roy.wed. * halloween * halloween * halloween * valentine* valentine * easter * easter raw joy signal 14−day smoothed joy
  11. 11. Overview   A.      Prefixed  keyword-­‐based  mining   B.      Automating  feature  selection   C.      User-­‐centric  (bilinear)  modelling   D.      Inferring  user  characteristics Extracting  interesting  concepts  from  large-­‐scale  textual  data 100%15%
  12. 12. The  case  of  influenza-­‐like  illness  (ILI) + existence  of  ‘ground  truth’  enables  optimisation  of  keyword  selection   + supervised  learning  task  (f:  X  —>  y) (Lampos  &  Cristianini,  2010   Lampos,  De  Bie  &  Cristianini,  2010 Case  study:  nowcasting  ILI  rates   + infer  ILI  rates  based  on  user  data   + ‘ground  truth’  provided  via  traditional   health  surveillance  schemes   + complementary  disease  indicator   + earlier-­‐warning   + applicable  to  parts  of  the  world  with   less  comprehensive  healthcare  systems   + noisy,  biased  demographics,  media  bias
  13. 13. The  case  of  influenza-­‐like  illness  (ILI) + existence  of  ‘ground  truth’  enables  optimisation  of  keyword  selection   + supervised  learning  task  (f:  X  —>  y) Tracking the flu pandemic by monitoring the Social Web Vasileios Lampos, Nello Cristianini Intelligent Systems Laboratory Faculty of Engineering University of Bristol, UK {bill.lampos, nello.cristianini}@bristol.ac.uk Abstract—Tracking the spread of an epidemic disease like seasonal or pandemic influenza is an important task that can reduce its impact and help authorities plan their response. In particular, early detection and geolocation of an outbreak are important aspects of this monitoring activity. Various methods are routinely employed for this monitoring, such as counting the consultation rates of general practitioners. We report on a monitoring tool to measure the prevalence of disease in a population by analysing the contents of social networking tools, such as Twitter. Our method is based on the analysis of hundreds of thousands of tweets per day, searching for symptom-related statements, and turning statistical information into a flu-score. We have tested it in the United Kingdom for 24 weeks during the H1N1 flu pandemic. We compare our flu-score with data from the Health Protection Agency, obtaining on average a statistically significant linear correlation which is greater than 95%. This method uses completely independent data to that commonly used for these purposes, and can be used at close time intervals, hence providing inexpensive and timely information about the state of an epidemic. 160 180 200 220 240 260 280 300 320 340 0 50 100 150 200 250 Day Number (2009) Flurate(HPA) Region A (RCGP) Region B (RCGP) Region C (RCGP) Region D (RCGP) Region E (QSur) Fig. 1: Flu rates from the Health Protection Agency (HPA) regions A-E (weeks 26-49, 2009). The original weekly HP flu rates have been expanded and smoothed in order to ma with the daily data stream of Twitter (see section III-B). Case  study:  nowcasting  ILI  rates   + infer  ILI  rates  based  on  Twitter  data   + ‘ground  truth’  provided  via  traditional   health  surveillance  schemes   + complementary  disease  indicator   + earlier-­‐warning   + applicable  to  parts  of  the  world  with   less  comprehensive  healthcare  systems   + noisy,  biased  demographics,  media  bias (Lampos  &  Cristianini,  2010  and     Lampos,  De  Bie  &  Cristianini,  2010) Official  ILI  regional  rates
  14. 14. The  case  of  influenza-­‐like  illness  (ILI) Twitter  data   + 27  million  tweets  from  54  UK  urban  centres   + June  22  to  December  6,  2009   Health  surveillance  data   + ILI  rates  expressing  GP  consultations  per  100,000  people,   where  the  diagnosis  was  ILI   Feature  extraction   + a  few  handcrafted  terms,  and   + all  unigrams  from  related  websites  (Wikipedia,  NHS,  etc.)   + =  1560  stemmed  unigrams  (most  of  which  unrelated) (Lampos  &  Cristianini,  2010  and     Lampos,  De  Bie  &  Cristianini,  2010)
  15. 15. Regularised  text  regressionRegression basics — Lasso • observations xxxi œ Rm, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias wj, — œ R, j œ {1, ..., m} — wwwú = [www; —] ¸1¸1¸1–norm regularisation or lasso (Tibshirani, 1996) argmin www,— Y _] _[ nÿ i=1 Q ayi ≠ — ≠ mÿ j=1 xijwj R b 2 + ⁄ mÿ j=1 |wj| Z _^ _ or argmin wwwú Ó ÎXXXúwwwú ≠ yyyÎ2 ¸2 + ⁄Îwwwθ1 Ô ≠≠≠ no closed form solution — quadratic programming problem +++ Least Angle Regression explores entire reg. path (Efron et al., 2004) +++ sparse www, interpretability, better performance (Hastie et al., 2009) ≠≠≠ if m > n, at most n variables can be selected ≠≠≠ strongly corr. predictors æ model-inconsistent (Zhao & Yu, 2009) broadly  known  as  the  ‘lasso’  (Tibshirani,  1996) as wj, — œ R, j œ {1, ..., m} — www –norm regularisation or lasso (Tibshirani, 1996) min — Y _] _[ nÿ i=1 Q ayi ≠ — ≠ mÿ j=1 xijwj R b 2 + ⁄ mÿ j=1 |wj| Z _^ _ or argmin wwwú Ó ÎXXXúwwwú ≠ yyyÎ2 ¸2 + ⁄Îwwwθ1 Ô orm solution — quadratic programming probl e Regression explores entire reg. path (Efron et nterpretability, better performance (Hastie et al.,
  16. 16. Manual  vs.  automated  feature  selection 180 200 220 240 260 280 300 320 340 0 20 40 60 80 100 120 140 160 Day Number (2009) Flurate HPA Inferred 180 200 220 240 260 280 300 320 340 −2 −1 0 2 3 4 Day Number (2009) Flurate/score(z−scores) Twitter’s Flu−score (region D) HPA’s Flu rate (region D) 41  handcrafted  markers   blood,  cold,  cough,  dizzy,  feel  sick,     feeling  unwell,  fever,  flu,  headache,   runny  nose,  shivers,  sore  throat,   stomach  ache  (…)   Automatically  selected  unigrams   lung,  unwel,  temperatur,  like,  headach,   season,  unusu,  chronic,  child,  dai,  appetit,   stai,  symptom,  spread,  diarrhoea,  start,   muscl,  weaken,  immun,  feel,  liver  (…) r  =  0.86 r  =  0.97 (Lampos  &  Cristianini,  2010)
  17. 17. Robustifying  the  previous  algorithm Lasso  may  not  select  the  true  model  due  to  collinearities  in  the  feature  space   (Zhao  &  Yu,  2006)   Bootstrap  lasso  (‘bolasso’)  for  feature  selection            (Bach,  2008)   + For  a  number  (N)  of  bootstraps,  i.e.  iterations   + Sample  the  feature  space  with  replacement  (Xi)   + Learn  a  new  model  (wi)  by  applying  lasso  on  Xi  and  y   + Remember  the  n-­‐grams  with  nonzero  weights   + Select  the  n-­‐grams  with  nonzero  weights  in  p%  of  the  N  bootstraps   + p  can  be  optimised  using  a  held-­‐out  validation  set   —  Will  all  this  generalise  to  a  different  case  study? (Lampos,  De  Bie  &  Cristianini,  2010  and     Lampos  &  Cristianini,  2012)
  18. 18. Robustifying  the  previous  algorithm Lasso  may  not  select  the  true  model  due  to  collinearities  in  the  feature  space   (Zhao  &  Yu,  2006)   Bootstrap  lasso  (‘bolasso’)  for  feature  selection            (Bach,  2008)   + For  a  number  (N)  of  bootstraps,  i.e.  iterations   + Sample  the  feature  space  with  replacement  (Xi)   + Learn  a  new  model  (wi)  by  applying  lasso  on  Xi  and  y   + Remember  the  n-­‐grams  with  nonzero  weights   + Select  the  n-­‐grams  with  nonzero  weights  in  p%  of  the  N  bootstraps   + p  can  be  optimised  using  a  held-­‐out  validation  set   —  Will  all  this  generalise  to  a  different  case  study? (Lampos,  De  Bie  &  Cristianini,  2010  and     Lampos  &  Cristianini,  2012)
  19. 19. Word  cloud  —  Selected  n-­‐grams  for  ILI
  20. 20. So,  apart  from  flu,   we  also  tried  to  nowcast     rainfall  rates.
  21. 21. Word  cloud  —  Selected  n-­‐grams  for  rain
  22. 22. Inference   examples   Top:   ILI  rates   Bottom:   Rainfall  rates 0 5 10 15 20 25 30 0 20 40 60 80 100 120 Days FluRate−C.England&Wales Actual Inferred 0 5 10 15 20 25 30 0 20 40 60 80 100 120 Days FluRate−S.England Actual Inferred 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 Days Rainfallrate(mm)−Bristol Actual Inferred 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 Days Rainfallrate(mm)−London Actual Inferred ILI  rates Rainfall  rates S.  England C.  England  &  Wales Bristol,  UK London,  UK
  23. 23. Overview   A.      Prefixed  keyword-­‐based  mining   B.      Automating  feature  selection   C.      User-­‐centric  (bilinear)  modelling   D.      Inferring  user  characteristics Extracting  interesting  concepts  from  large-­‐scale  textual  data 100%35%
  24. 24. User-­‐centric  modelling  :  why? + text   regression   models   usually   focus   on   the   word  space   + social  media  context  —>  words,  but  also  users   + models  may  benefit  by  incorporating  a  form  of   user  contribution  in  the  current  word  modelling   + in  this  way  more  relevant  users  contribute  more,   and  irrelevant  users  may  be  filtered  out   (Lampos,  Preotiuc-­‐Pietro  &  Cohn,  2013)
  25. 25. Bilinear Text Regression — The general idea (1/2) Linear regression: f (xxxi) = xxxT i www + — • observations xxxi œ Rm, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias wj, — œ R, j œ {1, ..., m} — wwwú = [www; —] Bilinear regression: f (QQQi) = uuuTQQQiwww + — • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} Bilinear Text Regression — The general idea (2/2) • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} Bilinear Text Regression — The general ide Linear regression: f (xxxi) = xxxT i www + — • observations xxxi œ Rm, i œ {1, ..., n} — • responses yi œ R, i œ {1, ..., n} — • weights, bias wj, — œ R, j œ {1, ..., m} — Bilinear regression: f (QQQi) = uuuTQQQiwww + — • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — • responses yi œ R, i œ {1, ..., n} — Linear  regression Bilinear Text Regression — The general i Linear regression: f (xxxi) = xxxT i www + — • observations xxxi œ Rm, i œ {1, ..., n} • responses yi œ R, i œ {1, ..., n} • weights, bias wj, — œ R, j œ {1, ..., m} Bilinear regression: f (QQQi) = uuuTQQQiwww + — • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} • responses yi œ R, i œ {1, ..., n} • weights, bias uk, wj, — œ R, k œ {1, ..., p} Bilinear  regression ‘bilinear’  modelling  :  definition
  26. 26. ‘bilinear’  modelling  :  definition ilinear Text Regression — The general idea (2/2) • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} f (QQQi) = uuuTQQQiwww + — ◊ ◊ + — uuuT QQQi www Bilinear Text Regression — The general idea (2/2) • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} f (QQQi) = uuuTQQQiwww + — ◊ ◊ + — uuuT QQQi www 16
  27. 27. Bilinear  regularised  regressionBilinear Text Regression — The general idea (2/2) • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} f (QQQi) = uuuTQQQiwww + — ◊ ◊ + — uuuT QQQi www 16 Bilinear Text Regression — Regularisation • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yi œ R, i œ {1, ..., n} — yyy • weights, bias uk, wj, — œ R, k œ {1, ..., p} — uuu, www, — j œ {1, ..., m} argmin uuu,www,— I nÿ i=1 1 uuuT QQQiwww + — ≠ yi 22 + Â(uuu, ◊u) + Â(www, ◊w) J Â(·): regularisation function with a set of hyper-parameters (◊) • if  (vvv, ⁄) = ⁄Îvvvθ1 Bilinear Lasso • if  (vvv, ⁄1, ⁄2) = ⁄1ÎvvvÎ2 ¸2 + ⁄2Îvvvθ1 Bilinear Elastic Net (BEN) (Lampos et al., 2013) (Lampos,  Preotiuc-­‐Pietro  &  Cohn,  2013)
  28. 28. An  extension:  bilinear  &  multi-­‐task + optimise   (learn   the   model   parameters   for)   a   number  of  tasks  jointly   + attempt  to  improve  generalisation  by  exploiting   domain  specific  information  of  related  tasks   + good   choice   for   under-­‐sampled   distributions   (knowledge  transfer)   + application-­‐driven  reasons  (e.g.  voting  intention   modelling)   (Caruana,  1997;  Lampos,  Preotiuc-­‐Pietro  &  Cohn,  2013)
  29. 29. Bilinear  multi-­‐task  text  regressionBilinear Multi-Task Learning • tasks · œ Z+ • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yyyi œ R· , i œ {1, ..., n} — YYY • weights, bias uuuk,wwwj,——— œ R· , k œ {1, ..., p} — UUU, WWW, ——— j œ {1, ..., m} f (QQQi) = tr 1 UUUTQQQiWWW 2 + ——— ◊ ◊ UUUT QQQi WWW v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 23/45 23 /45 Bilinear Multi-Task Learning • tasks · œ Z+ • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yyyi œ R· , i œ {1, ..., n} — YYY • weights, bias uuuk,wwwj,——— œ R· , k œ {1, ..., p} — UUU, WWW, ——— j œ {1, ..., m} f (QQQi) = tr 1 UUUTQQQiWWW 2 + ——— ◊ ◊ UUUT QQQi WWW
  30. 30. Bilinear  Group  l2,1  (BGL)Bilinear Multi-Task Learning • tasks · œ Z+ • users p œ Z+ • observations QQQi œ Rp◊m, i œ {1, ..., n} — XXX • responses yyyi œ R· , i œ {1, ..., n} — YYY • weights, bias uuuk,wwwj,——— œ R· , k œ {1, ..., p} — UUU, WWW, ——— j œ {1, ..., m} f (QQQi) = tr 1 UUUTQQQiWWW 2 + ——— ◊ ◊ UUUT QQQi WWW v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 23/45 23 /45 ks · œ Z s p œ Z+ ervations QQQi œ Rp◊m, i œ {1, ..., n} — XXX onses yyyi œ R· , i œ {1, ..., n} — YYY ghts, bias uuuk,wwwj,——— œ R· , k œ {1, ..., p} — UUU, WWW j œ {1, ..., m} argmin UUU,WWW,——— I ·ÿ t=1 nÿ i=1 1 uuuT t QQQiwwwt + —t ≠ yti 22 + ⁄u pÿ k=1 ÎUUUkÎ2 + ⁄w mÿ j=1 ÎWWWjÎ2 J L can be broken into 2 convex tasks: first learn {WWW,———}, (Argyriou  et  al.,  2008)
  31. 31. BGL’s  main  propertyargmin UUU,WWW,——— I ·ÿ t=1 nÿ i=1 1 uuuT t QQQiwwwt + —t ≠ yti 22 + ⁄u pÿ k=1 ÎUUUkÎ2 + ⁄w mÿ j=1 ÎWWWjÎ2 J ◊ ◊ UUUT QQQi WWW a feature (user/word) is selected for all tasks (not just one), but possibly with di erent weights especially useful in the domain of politics (e.g. user pro party A • weights, bias uuuk,wwwj,——— œ R· , k œ {1, ..., p} — UUU, WWW, ——— j œ {1, ..., m} argmin UUU,WWW,——— I ·ÿ t=1 nÿ i=1 1 uuuT t QQQiwwwt + —t ≠ yti 22 + ⁄u pÿ k=1 ÎUUUkÎ2 + ⁄w mÿ j=1 ÎWWWjÎ2 J • BGL can be broken into 2 convex tasks: first learn {WWW,———}, then {UUU,———} and vv + iterate through this process v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 24/45 24 /45 + a  feature  (user  or  word)  is  usually  selected  (activated)  for   all  tasks,  but  with  different  weights   + useful  in  the  domain  of  political  preference  inference
  32. 32. Inferring  voting   intention  via  Twitter   Left  side:   UK  —  3  parties,  42K   users  (~  to  regional   population),  81K   unigrams,  240  polls,   2  years   Right  side:   Austria  —  4  parties,   1.1K  manually   selected  users,  23K   unigrams,  98  polls,  1   year   5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ BGL BEN YouGov Polls BEN BGL
  33. 33. Root  Mean  Squared  Error 0 0.62 1.24 1.86 2.48 3.1 UK Austria 1.4391.478 1.699 1.573 1.442 3.067 1.47 1.723 1.851 1.69 Mean  poll Last  poll Elastic  Net  (words) BEN BGL Performance  figures  —  BGL  prevails
  34. 34. BGL-­‐scored  tweet  examples  (Austria) Party Tweet Score User  type SPÖ Inflation  rate  in  Austria  slightly  down  in  July   from  2.2  to  2.1%.  Accommodation,  Water,   Energy  more  expensive. 0.745 Journalist ÖVP Can  really  recommend  the  book  “Res  Publica”   by  Johannes  #Voggenhuber!  Food  for  thought   and  so  on  #Europe  #Democracy -­‐2.323 User FPÖ Campaign  of  the  Viennese  SPO  on  “Living   together”  plays  right  into  the  hands  of  right-­‐ wing  populists -­‐3.44 Human   rights GRÜ Protest  songs  against  the  closing-­‐down  of  the   bachelor  course  of  International   Development:  <link>  #ID_remains  #UniBurns   #UniRage 1.45 Student   Union
  35. 35. Overview   A.      Prefixed  keyword-­‐based  mining   B.      Automating  feature  selection   C.      User-­‐centric  (bilinear)  modelling   D.      Inferring  user  characteristics Extracting  interesting  concepts  from  large-­‐scale  textual  data 100%63%
  36. 36. Predicting  user  impact  on  Twitter + Validate  a  hypothesis:  “User  behaviour  on  a  social   platform  reflects  on  user  impact”   + What  parts  of  user  behaviour  are  more  relevant  to   a  notion  of  user  impact?   + In  this  regard,  how  informative  are  the  text  inputs   from  the  users?   (Lampos,  Aletras,  Preotiuc-­‐Pietro  &  Cohn,  2014)
  37. 37. Defining  an  impact  score  (S) ct — a simplified definition S(„in, „out, „⁄) = ln A („⁄ + ◊) („in + ◊)2 „out + ◊ B mber of followers, „out: number of followees mber of times the account has been listed ogarithm is applied on a positive number ut " = („in ≠ „out) ◊ („in/„out) + „in of the user impact n our data set S) = 6.776 −5 0 5 10 15 20 25 30 0 0.05 0.1 0.15 Impact Score (S) ProbabilityDensity @guardian @David_Cameron @PaulMasonNews @lampos @nikaletras @spam? c.uk Slides: http://bit.ly/1GrxI8j 40/52 40 /52 „out • „in: number of followers, „out: number of follow • „⁄: number of times the account has been liste • ◊ = 1, logarithm is applied on a positive numbe • ! „2 in/„out " = („in ≠ „out) ◊ („in/„out) + „in Histogram of the user impact scores in our data set µ(S) = 6.776 −5 0 5 0 0.05 0.1 0.15 ProbabilityDensity @spam? v.lampos@ucl.ac.uk Slides: er impact — a simplified definition S(„in, „out, „⁄) = ln A („⁄ + ◊) („in + ◊)2 „out + ◊ B • „in: number of followers, „out: number of followees • „⁄: number of times the account has been listed • ◊ = 1, logarithm is applied on a positive number • ! „2 in/„out " = („in ≠ „out) ◊ („in/„out) + „in Histogram of the user impact scores in our data set µ(S) = 6.776 0.05 0.1 0.15 ProbabilityDensity @guardian @David_Cameron @PaulMasonNews @lampos @nikaletras @spam? User impact — a simplified definition S(„in, „out, „⁄) = ln A („⁄ + ◊) („ „out • „in: number of followers, „out: number of follow • „⁄: number of times the account has been liste • ◊ = 1, logarithm is applied on a positive numbe • ! „2 in/„out " = („in ≠ „out) ◊ („in/„out) + „in Histogram of the user impact scores in our data set µ(S) = 6.776 −5 0 5 0 0.05 0.1 0.15 ProbabilityDensity @spam? User impact — a simplified definition S(„in, „out, „⁄) = ln A („⁄ + ◊) („in + „out + ◊ • „in: number of followers, „out: number of followees • „⁄: number of times the account has been listed • ◊ = 1, logarithm is applied on a positive number • ! „2 in/„out " = („in ≠ „out) ◊ („in/„out) + „in Histogram of the user impact scores in our data set µ(S) = 6.776 −5 0 5 10 0 0.05 0.1 0.15 Impac ProbabilityDensity @ @l @ @spam? mplified definition , „⁄) = ln A („⁄ + ◊) („in + ◊)2 „out + ◊ B rs, „out: number of followees he account has been listed plied on a positive number ut) ◊ („in/„out) + „in pact 0.05 0.1 0.15 ProbabilityDensity @guardian @David_Cameron @PaulMasonNews @lampos @nikaletras @spam? —>  number  of  followers —>  number  of  followees —>  number  of  times  listed —>  logarithm  is  applied  on  a  positive   β        Vasileios  Lampos  ~  @lampos   ν        Nikolaos  Aletras  ~  @nikaletras   40K  Twitter  accounts  (UK)  considered
  38. 38. Impact  prediction  as  a  regression  task (Lampos,  Aletras,  Preotiuc-­‐Pietro  &  Cohn,  2014) Pearson  correlation  (r) 0.5 0.56 0.62 0.68 0.74 0.8 Feature  type Attributes Attributes  &  Words Attributes  &  Topics 0.78 0.7680.759 0.7140.712 0.667 Ridge  Regression Gaussian  Process
  39. 39. Some  of  the  most   important  user   attributes  for  impact   (excl.  topics)   500  accounts  with   the  lower  (L)  and   higher  (H)  impacts   for  an  attribute   Impact  plus   + more  tweets   + more  @mentions   + more  links   + less  @replies   + less  inactive  days   Impact  minus   + less  tweets   + less  links   + more  inactive  days 0 100 0 100 0 100 0 100 0 10 20 30 0 100 0 10 20 30 L H L H L H L H L H Tweets in entire history (α11 ) Unique @-mentions (α7 ) Links (α9 ) @-replies (α8 ) Days with nonzero tweets (α18 ) impact  score number  of  users mean     impact  score    for  ALL  users (Lampos,  Aletras,   Preotiuc-­‐Pietro  &   Cohn,  2014)
  40. 40. We  can  guess  the  impact   of  user  from  user  activity,     but  can  we  infer  his  /  her   occupation?
  41. 41. Inferring  the  occupational  class  of  a  Twitter  user “Socioeconomic  variables  are  influencing  language  use.”   + Validate  this  hypothesis  on  a  larger  data  set   + Downstream  applications   + research  (social  science  &  other  domains)   + commercial   + Proxy  for  income,  socioeconomic  class  etc.,  i.e.   further  applications   (Bernstein,  1960;  Labov,  1972/2006) (Preotiuc-­‐Pietro,  Lampos  &  Aletras,  2015)
  42. 42. Standard  Occupational  Classification  (SOC,  2010) C1      Managers,  Directors  &  Senior  Officials  —  chief  executive,  bank  manager   C2      Professional  Occupations  —  mechanical  engineer,  pediatrist,  postdoc  (!)   C3      Associate  Professional  &  Technical  —  system  administrator,  dispensing  optician   C4      Administrative  &  Secretarial  —  legal  clerk,  company  secretary   C5      Skilled  Trades  —  electrical  fitter,  tailor   C6      Caring,  Leisure,  Other  Service  —  nursery  assistant,  hairdresser   C7      Sales  &  Customer  Service  —  sales  assistant,  telephonist   C8      Process,  Plant  and  Machine  Operatives  —  factory  worker,  van  driver   C9      Elementary  —  shelf  stacker,  bartender Google  “ONS”  AND  “SOC”  for  more  information
  43. 43. Data + 5,191  users  mapped  to  their  occupations,  then  mapped  to  one  of  the  9   SOC  categories  —  manual  (!)  labelling   + 10  million  tweets   + Get  processed  data:  http://www.sas.upenn.edu/~danielpr/jobs.tar.gz   %  of  users  per  SOC  category 0 10 20 30 40 C1 C2 C3 C4 C5 C6 C7 C8 C9
  44. 44. Features User  attributes  (18)   + number  of  followers,  friends,  listings,  follower/friend  ratio,   favourites,  tweets,  retweets,  hashtags,  @-­‐mentions,  @-­‐replies,  links   and  so  on   Topics  —  Word  clusters  (200)   + SVD  on  the  graph  laplacian  of  the  word  x  word  similarity  matrix   using  normalised  PMI,  i.e.  a  form  of  spectral  clustering
 + Skip-­‐gram  model  with  negative  sampling  to  learn  word  embeddings   (Word2Vec);  pairwise  cosine  similarity  on  the  embeddings  to  derive   a  word  x  word  similarity  matrix;  then  spectral  clustering  on  the   (Bouma,  2009;  von  Luxburg,  2007) (Mikolov  et  al.,  2013)
  45. 45. Accuracy  (%) 25 31 37 43 49 55 Feature  type User  Attributes SVD-­‐200-­‐clusters Word2Vec-­‐200-­‐clusters 52.7 48.2 34.2 51.7 47.9 31.5 46.9 44.2 34 Logistic  Regression SVM  (RBF) Gaussian  Process Occupational  class  (9-­‐way)  classification major  class   baseline
  46. 46. Topics Manual  label Most  central  words;  Most  frequent  words Rank Arts archival,  stencil,  canvas,  minimalist;  art,  design,  print 1 Health chemotherapy,  diagnosis,  disease;  risk,  cancer,  mental,  stress 2 Beauty  Care exfoliating,  cleanser,  hydrating;  beauty,  natural,  dry,  skin 3 Higher   Education undergraduate,  doctoral,  academic,  students,  curriculum;   students,  research,  board,  student,  college,  education,  library 4 Software   Engineering integrated,  data,  implementation,  integration,  enterprise;   service,  data,  system,  services,  access,  security 5 Football bardsley,  etherington,  gallas;  van,  foster,  cole,  winger 7 Corporate consortium,  institutional,  firm’s;  patent,  industry,  reports 8 Cooking parmesan,  curried,  marinated,  zucchini;  recipe,  meat,  salad 9 Elongated   Words yaaayy,  wooooo,  woooo,  yayyyyy,  yaaaaay,  yayayaya,  yayy;   wait,  till,  til,  yay,  ahhh,  hoo,  woo,  woot,  whoop,  woohoo 12 Politics religious,  colonialism,  christianity,  judaism,  persecution,   fascism,  marxism;  human,  culture,  justice,  religion,  democracy 16
  47. 47. Discussion  topics   per  occupational   class  —  CDF  plots   Plots  explained   Topic  more  prevalent   in  a  class  (C1-­‐C9),  if   the  line  leans  closer   to  the  bottom-­‐right   corner  of  the  plot   Upper  classes   + Higher  Education   + Corporate   + Politics   Lower  classes   + Beauty  Care   + Elongated  Words 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Userprobability Higher Education (#21) C1 C2 C3 C4 C5 C6 C7 C8 C9 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Corporate (#124) C1 C2 C3 C4 C5 C6 C7 C8 C9 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Userprobability Politics (#176) C1 C2 C3 C4 C5 C6 C7 C8 C9 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Arts (#116) C1 C2 C3 C4 C5 C6 C7 C8 C9 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Topic proportion Userprobability Beauty Care (#153) C1 C2 C3 C4 C5 C6 C7 C8 C9 0.001 0.01 0.05 0 0.2 0.4 0.6 0.8 1 Topic proportion Elongated Words (#164) C1 C2 C3 C4 C5 C6 C7 C8 C9
  48. 48. Left:  Distance  (Jensen-­‐Shannon  divergence)  between  topic  distributions  for   the  different  occupational  classes,  depicted  on  a  heatmap   Right:  Comparison  of  mean  topic  usage  between  supersets  of  occupational   classes  (1-­‐2  vs.  6-­‐9) Topics C 1–2 C 6–9 Arts 4.95 2.79 Health 4.45 2.13 Beauty Care 1.40 2.24 Higher Education 6.04 2.56 Software Engineering 6.31 2.54 Football 0.54 0.52 Corporate 5.15 1.41 Cooking 2.81 2.49 Elongated Words 1.90 3.78 Politics 2.14 1.06 Table 5: Comparison of mean topic usage for super-sets (classes 1–2 vs. 6–9) of the occupational classes; all values were multiplied by 103. The dif ference between the topic usage distributions was statistically significant (p < 10 5). ‘Corporate’ topic, whereas ‘Football’ registers the lowest distance. wer skilled jobs tweet more often. igure 3: Jensen-Shannon divergence in the topic stributions between the different occupational asses (C 1–9). The main conclusion we draw from Figure 2 is at there exists a topic divergence between users in e lower vs. higher skilled occupational classes. To xamine this distinction better, we use the Jensen-
  49. 49. concluding…   Extracting  interesting  concepts  from  large-­‐scale  textual  data 95%
  50. 50. Conclusions Publicly  available,  user-­‐generated  content  can  be  used  to   better  understand:   + collective  emotion   + disease  rates  or  the  magnitude  of  some  target  events   + voting  intentions   + user  attributes  (impact,  occupation)   A  number  of  studies  (too  many  to  cite)  have  attempted   different  —  sometimes  improved  —  approaches  on  the   methods  presented  here.   Many  studies  have  also  explored  different  data  mining   scenarios  (e.g.  infer  user  gender,  financial  indices  etc.).
  51. 51. Some  of  the  challenges  ahead + Work  closer  with  domain  experts  (social  scientists  to   epidemiologists)   e.g.  in  collaboration  with  Public  Health  England  we  proposed   a  method  for  assessing  the  impact  of  a  health  intervention   through  social  media  and  search  query  data   + Understand  better  the  biases  of  the  online  media  (when  it  is   desirable  to  conduct  more  generic  conclusions)   note  that  sometimes  these  biases  may  be  a  good  thing   + Attack  more  interesting  (usually  more  complex)  questions   e.g.  generalise  the  inference  of  offline  from  online  behaviour   + Improve  on  existing  methods (Lampos,  Yom-­‐Tov,  Pebody  &  Cox,  2015)
  52. 52. Collaborators  participating  in  the  work  presented  today (in  alphabetical  order)   Alberto  Acerbi                                                                                      Anthropology,  Eindhoven  University  of  Technology   Nikolaos  Aletras                                                            Natural  Language  Processing,  University  College  London   Alex  Bentley                                                                                                Anthropology  &  Archaeology,  University  of  Bristol   Trevor  Cohn                                                                                  Natural  Language  Processing,  University  of  Melbourne   Nello  Cristianini                                                                                                                  Artificial  Intelligence,  University  of  Bristol   Tijl  De  Bie                                                                                                  Computational  Pattern  Analysis,  University  of  Bristol   Philip  Garnett                                                                                                                                        Complex  Systems,  University  of  York     Thomas  Lansdall-­‐Welfare                                                                                    Computer  Science,  University  of  Bristol   Paul  Ormerod                                                      Decision  Making  and  Uncertainty,  University  College  London   Daniel  Preotiuc-­‐Pietro                                    Natural  Language  Processing,  University  of  Pennsylvania  
  53. 53. Thank  you!   slides  available  at   http://www.lampos.net/sites/default/files/slides/ACA2015.pdf Extracting  interesting  concepts     from  large-­‐scale  textual  data
  54. 54. Bonus  slides Extracting  interesting  concepts  from  large-­‐scale  textual  data 100%
  55. 55. Training  Bilinear  Elastic  Net  (BEN)Bilinear Elastic Net (BEN) argmin uuu,www,— I nÿ i=1 1 uuuT QQQiwww + — ≠ yi 22 + ⁄u1 ÎuuuÎ2 ¸2 + ⁄u2 Îuuuθ1 + ⁄w1 ÎwwwÎ2 ¸2 + ⁄w2 Îwwwθ1 J • Bi-convexity: fix uuu, learn www and • Iterating through convex optimisation tasks: convergence (Al-Khayyal & Falk, 1983; Horst & Tuy, 19 • FISTA (Beck & Teboulle, 2009) in SPAMS (Mairal et al., 2010): Large-scale optimisation solver, quick convergence V. Lampos v.lampos@ucl.ac.uk Bilinear Text Net (BEN) QQQiwww + — ≠ yi 22 BEN’s objective functio 2 + ⁄u2 Îuuuθ1 2 ¸2 + ⁄w2 Îwwwθ1 J 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 0.4 0.8 1.2 1.6 2 2.4 Step Global Objective RMSE x uuu, learn www and vv h convex ks: convergence BEN’s  objective  function      —    —    —    —    —>   Biconvex  problem     + fix  u,  learn  w  and  vice  versa   + iterate  through  convex  optimisation  tasks   Large-­‐scale  solvers  available   + FISTA  implemented  in  SPAMS  library (Beck  &  Teboulle,  2009;  Mairal  et  al.,  2010) Global  objective  function   during  training  (red)   Corresponding  prediction   error  on  held  out  data  (blue)
  56. 56. Frequency Word Outlet Polarity Yes Yes + - Weight a a Bilinear  modelling  of  EU  unemployment  via  news  summaries (Lampos  et  al.,  2014)
  57. 57. More  information  about  Gaussian  Processes + non-­‐linear,  kernelised,  non-­‐parametric,  modular   + applicable  in  both  regression  and  classification  scenarios   + interpretable   Pointers   + Book  —  “Gaussian  Processes  for  Machine  Learning”
 http://www.gaussianprocess.org/gpml/   + Tutorial  —  “Gaussian  Processes  for  Natural  Language  Processing”
 http://people.eng.unimelb.edu.au/tcohn/tutorial.html   + Video-­‐lecture  —  “Gaussian  Process  Basics”
 http://videolectures.net/gpip06_mackay_gpb/   + Software  I  —  GPML  for  Octave  or  MATLAB
 http://www.gaussianprocess.org/gpml/code   + Software  II  —  GPy  for  Python
 http://sheffieldml.github.io/GPy/ (Rasmussen  &  Williams,  2006)
  58. 58. Occupational  class  (9-­‐way)  classification  confusion  matrix for a topic across these classifiers. Topic labels are ma Figure 1: Confusion matrix of the prediction results. w predicted  class actual  class
  59. 59. References Acerbi,  Lampos,  Garnett  &  Bentley.  The  Expression  of  Emotions  in  20th  Century  Books.  PLoS  ONE,  2013.   Bach.  Bolasso:  Model  Consistent  Lasso  Estimation  through  the  Bootstrap.  ICML,  2008.   Beck  &  Teboulle.  A  Fast  Iterative  Shrinkage-­‐Thresholding  Algorithm  for  Linear  Inverse  Problems.  J.  Imaging.  Sci.,  2009.   Bentley,  Acerbi,  Ormerod  &  Lampos.  Books  Average  Previous  Decade  of  Economic  Misery.  PLoS  ONE,  2014.   Bouma.  Normalized  (pointwise)  mutual  information  in  collocation  extraction.  GSCL,  2009.   Caruana.  Multi-­‐task  Learning.  Machine  Learning,  1997.   Labov.  The  Social  Stratification  of  English  in  New  York  City,  1972;  2nd  ed.  2006.   Lampos  &  Cristianini.  Tracking  the  flu  pandemic  by  monitoring  the  Social  Web.  CIP,  2010.   Lampos  &  Cristianini.  Nowcasting  Events  from  the  Social  Web  with  Statistical  Learning.  ACM  TIST,  2012.   Lampos,  De  Bie  &  Cristianini.  Flu  Detector  -­‐  Tracking  Epidemics  on  Twitter.  ECML  PKDD,  2010.   Lampos,  Preotiuc-­‐Pietro  &  Aletras.  Predicting  and  characterising  user  impact  on  Twitter.  EACL,  2014.   Lampos,  Preotiuc-­‐Pietro  &  Cohn.  A  user-­‐centric  model  of  voting  intention  from  Social  Media.  ACL,  2013.   Lampos,  Yom-­‐Tov,  Pebody  &  Cox.  Assessing  the  impact  of  a  health  intervention  via  user-­‐generated  Internet  content.  DMKD,  2015.   Lampos  et  al.  Extracting  Socioeconomic  Patterns  from  the  News:  Modelling  Text  and  Outlet  Importance  Jointly.  ACL  LACSS,  2014.   Mairal,  Jenatton,  Obozinski  &  Bach.  Network  Flow  Algorithms  for  Structured  Sparsity.  NIPS,  2010.   Mikolov,  Chen,  Corrado  and  Dean.  Efficient  estimation  of  word  representations  in  vector  space.  ICLR,  2013.   Pennebaker  et  al.  Linguistic  Inquiry  and  Word  Count:  LIWC2007,  Tech.  Rep.,  2007.   Preotiuc-­‐Pietro,  Lampos  &  Aletras.  An  analysis  of  the  user  occupational  class  through  Twitter  content.  ACL,  2015.   Rasmussen  &  Williams.  Gaussian  Processes  for  Machine  Learning.  MIT  Press,  2006.   Strapparava  &  Valitutti.  Wordnet-­‐Affect:  An  affective  extension  of  WordNet.  LREC,  2004.   Tibshirani.  Regression  shrinkage  and  selection  via  the  lasso.  JRSS  Series  B  (Method.),  1996.   von  Luxburg.  A  tutorial  on  spectral  clustering.  Statistics  and  Computing,  2007.   Zhao  &  Yu.  On  model  selection  consistency  of  lasso.  JMLR,  2006.

×