SlideShare a Scribd company logo
1 of 45
Download to read offline
Bilinear Text Regression
and Applications
Vasileios Lampos
Department of Computer Science
University College London
March, 2015
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 1/45
1
/45
Outline
⊥ Linear Regression Methods
Bilinear Regression Methods
Applications
|= Conclusions
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 2/45
2
/45
Recap on regression methods
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 3/45
3
/45
Regression basics — Ordinary Least Squares (1/2)
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Ordinary Least Squares (OLS)
argmin
www,β
n
i=1

yi − β −
m
j=1
xijwj


2
or in matrix form
argmin
www∗
XXX∗www∗ − yyy 2
2
, where XXX∗ = [XXX diag (III)]
⇒ www∗ = XXXT
∗ XXX∗
−1
XXXT
∗ yyy
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 4/45
4
/45
Regression basics — Ordinary Least Squares (2/2)
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Ordinary Least Squares (OLS)
argmin
www∗
XXX∗www∗ − yyy 2
2
⇒ www∗ = XXXT
∗ XXX∗
−1
XXXT
∗ yyy
Why not?
−−− XXXT
∗ XXX∗ may be singular (thus difficult to invert)
−−− high-dimensional models difficult to interpret
−−− unsatisfactory prediction accuracy (estimates have large variance)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 5/45
5
/45
Regression basics — Ridge Regression (1/2)
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Ridge Regression (RR)
www∗ = XXXT
∗ XXX∗ + λIII
non singular
−1
XXXT
∗ yyy (Hoerl & Kennard, 1970)
argmin
www,β



n
i=1

yi − β −
m
j=1
xijwj


2
+ λ
m
j=1
w2
j



or argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 2
2
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 6/45
6
/45
Regression basics — Ridge Regression (2/2)
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Ridge Regression (RR)
argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 2
2
+++ size constraint on the weight coefficients (regularisation)
→ resolves problems caused by collinear variables
+++ less degrees of freedom, better predictive accuracy than OLS
−−− does not perform feature selection (nonzero coefficients)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 7/45
7
/45
Regression basics — Lasso
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
111–norm regularisation or lasso (Tibshirani, 1996)
argmin
www,β



n
i=1

yi − β −
m
j=1
xijwj


2
+ λ
m
j=1
|wj|



or argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 1
−−− no closed form solution — quadratic programming problem
+++ Least Angle Regression explores entire reg. path (Efron et al., 2004)
+++ sparse www, interpretability, better performance (Hastie et al., 2009)
−−− if m > n, at most n variables can be selected
−−− strongly corr. predictors → model-inconsistent (Zhao & Yu, 2009)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 8/45
8
/45
Regression basics — Lasso for Text Regression
• n-gram frequencies xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• flu rates yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
111–norm regularisation or lasso
or argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 1
‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ...
180 200 220 240 260 280 300 320 340
0
50
100
Day Number (2009)
Flurate
HPA
Inferred
0 10 20 30 40 50 60 70 80 90
0
50
100
150
Days
Flurate
HPA
Inferred
A B C D E
Figure 1 : Flu rate predictions for the UK by applying lasso on Twitter data
(Lampos & Cristianini, 2010)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 9/45
9
/45
Regression basics — Elastic Net
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
[Linear] Elastic Net (LEN)
(Zhou & Hastie, 2005)
argmin
www∗



XXX∗www∗ − yyy 2
2
OLS
+ λ1 www 2
2
RR reg.
+ λ2 www 1
Lasso reg.



+++ ‘compromise’ between ridge regression (handles collinear
predictors) and lasso (favours sparsity)
+++ entire reg. path can be explored by modifying LAR
+++ if m > n, number of selected variables not limited to n
−−− may select redundant variables!
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 10/45
10
/45
Would a slightly different text regression approach
be more suitable for Social Media content?
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 11/45
11
/45
About Twitter (1/2)
Tweet Examples
@PaulLondon: I would strongly support a coalition government. It is the best
thing for our country right now. #electionsUK2010
@JohnsonMP: Socialism is something forgotten in our country #supportLabour
@FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP
@JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!!
Bieber rules, peace and love ♥ ♥ ♥
The Twitter basics:
• 140 characters per status (tweet)
• users follow and be followed
• embedded usage of topics (#elections)
• retweets (RT), @replies, @mentions, favourites
• real-time nature
• biased user demographics
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 12/45
12
/45
About Twitter (2/2)
Tweet Examples
@PaulLondon: I would strongly support a coalition government. It is the best
thing for our country right now. #electionsUK2010
@JohnsonMP: Socialism is something forgotten in our country #supportLabour
@FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP
@JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!!
Bieber rules, peace and love ♥ ♥ ♥
• contains a vast amount of information about various topics
• this information (XXX) can be used to assist predictions (yyy)
(Lampos & Cristianini, 2012; Sakaki et al., 2010; Bollen et al., 2011)
−−− f : XXX → yyy, f usually formulates a linear regression task
−−− XXX represents word frequencies only...
+++ is it possible to incorporate a user contribution somehow?
word selection +++ user selection
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 13/45
13
/45
Bi-linear Text Regression
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 14/45
14
/45
Bilinear Text Regression — The general idea (1/2)
Linear regression: f (xxxi) = xxxT
i www + β
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Bilinear regression: f (QQQi) = uuuTQQQiwww + β
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β
j ∈ {1, ..., m}
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 15/45
15
/45
Bilinear Text Regression — The general idea (2/2)
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β
j ∈ {1, ..., m}
f (QQQi) = uuuTQQQiwww + β
× × + β
uuuT QQQi
www
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 16/45
16
/45
Bilinear Text Regression — Regularisation
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β
j ∈ {1, ..., m}
argmin
uuu,www,β
n
i=1
uuuT
QQQiwww + β − yi
2
+ ψ(uuu, θu) + ψ(www, θw)
ψ(·): regularisation function with a set of hyper-parameters (θ)
• if ψ (vvv, λ) = λ vvv 1 Bilinear Lasso
• if ψ (vvv, λ1, λ2) = λ1 vvv 2
2
+ λ2 vvv 1 Bilinear Elastic Net (BEN)
(Lampos et al., 2013)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 17/45
17
/45
Bilinear Elastic Net (BEN)
argmin
uuu,www,β
n
i=1
uuuT
QQQiwww + β − yi
2
BEN’s objective function
+ λu1 uuu 2
2
+ λu2 uuu 1
+ λw1 www 2
2
+ λw2 www 1
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
0.4
0.8
1.2
1.6
2
2.4
Step
Global Objective
RMSE
Figure 2 : Objective function
value and RMSE (on hold-out
data) through the model’s
iterations
• Bi-convexity: fix uuu, learn www and vv
• Iterating through convex
optimisation tasks: convergence
(Al-Khayyal & Falk, 1983; Horst & Tuy, 1996)
• FISTA (Beck & Teboulle, 2009)
in SPAMS (Mairal et al., 2010):
Large-scale optimisation solver,
quick convergence
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 18/45
18
/45
Multi-Task Learning
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 19/45
19
/45
Multi-Task Learning
What
• Instead of learning/optimising a single task (one target variable)
• ... optimise multiple tasks jointly
Why (Caruana, 1997)
• improves generalisation performance exploiting domain-specific
information of related tasks
• a good choice for under-sampled distributions — knowledge
transfer
• application-driven reasons (e.g. explore interplay between political
parties)
How
• Multi-task regularised regression
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 20/45
20
/45
The 2,12,12,1-norm regularisation
WWW 2,1 =
m
j=1
WWWj 2 , where WWWj denotes the j-th row
2,12,12,1-norm regularisation
argmin
WWW,βββ



XXXWWW − YYY 2
F
+ λ
m
j=1
WWWj 2



• multi-task learning: instead of www ∈ Rm, learn WWW ∈ Rm×τ , where τ
is the number of tasks
• 2,1-norm regularisation, i.e. the sum of WWW’s row 2-norms (Argyriou
et al., 2008; Liu et al., 2009) extends the notion of group lasso (Yuan &
Lin, 2006)
• group lasso: instead of single variables, selects groups of variables
• ‘groups’ now become the τ-dimensional rows of WWW
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 21/45
21
/45
Bilinear +++ Multi-Task Learning
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 22/45
22
/45
Bilinear Multi-Task Learning
• tasks τ ∈ Z+
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yyyi ∈ Rτ , i ∈ {1, ..., n} — YYY
• weights, bias uuuk,wwwj,βββ ∈ Rτ , k ∈ {1, ..., p} — UUU, WWW, βββ
j ∈ {1, ..., m}
f (QQQi) = tr UUUTQQQiWWW + βββ
× ×
UUUT QQQi WWW
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 23/45
23
/45
Bilinear Group 2,12,12,1 (BGL) (1/2)
• tasks τ ∈ Z+
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yyyi ∈ Rτ , i ∈ {1, ..., n} — YYY
• weights, bias uuuk,wwwj,βββ ∈ Rτ , k ∈ {1, ..., p} — UUU, WWW, βββ
j ∈ {1, ..., m}
argmin
UUU,WWW,βββ
τ
t=1
n
i=1
uuuT
t QQQiwwwt + βt − yti
2
+ λu
p
k=1
UUUk 2 + λw
m
j=1
WWWj 2
• BGL can be broken into 2 convex tasks: first learn {WWW,βββ}, then
{UUU,βββ} and vv + iterate through this process
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 24/45
24
/45
Bilinear Group 2,12,12,1 (BGL) (2/2)
argmin
UUU,WWW,βββ
τ
t=1
n
i=1
uuuT
t QQQiwwwt + βt − yti
2
+ λu
p
k=1
UUUk 2 + λw
m
j=1
WWWj 2
× ×
UUUT QQQi WWW
• a feature (user/word) is selected for all tasks (not just one), but
possibly with different weights
• especially useful in the domain of politics (e.g. user pro party A,
against party B)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 25/45
25
/45
Voting Intention Modelling
(Lampos et al., 2013)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 26/45
26
/45
Political Opinion/Voting Intention Mining — Brief Recap
Primary papers
• predict the result of an election via Twitter (Tumasjan et al., 2010)
• model socio-political sentiment polls (O’Connor et al., 2010)
• above 2 failed on 2009 US congr. elections (Gayo-Avello, 2011)
• desired properties of such models (Metaxas et al., 2011)
Features
• lexicon-based, e.g. using LIWC (Tausczik & Pennebaker, 2010)
• task-specific keywords (names of parties, politicians)
• tweet volume
reviewed in (Gayo-Avello, 2013)
−−− political descriptors change in time, differ per country
−−− personalised modelling (present in actual polls) missing
−−− multi-task learning?
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 27/45
27
/45
Voting Intention Modelling — Data (United Kingdom)
• 42K users distributed proportionally to regional population figures
• 60m tweets from 30/04/2010 to 13/02/2012
• 80, 976 unigrams (word features)
• 240 voting intention polls (YouGov)
• 3 parties: Conservatives (CON), Labour Party (LAB), Liberal
Democrats (LIB)
• main language: English
5 30 55 80 105 130 155 180 205 230
0
5
10
15
20
25
30
35
40
45
VotingIntention%
Time
CON
LAB
LIB
Figure 3 : Voting intention time series for the UK (YouGov)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 28/45
28
/45
Voting Intention Modelling — Data (Austria)
• 1.1K users manually selected by Austrian political analysts (SORA)
• 800K tweets from 25/01 to 01/12/2012
• 22, 917 unigrams (word features)
• 98 voting intention polls from various pollsters
• 4 parties: Social Democratic Party (SP¨O), People’s Party (¨OVP),
Freedom Party (FP¨O), Green Alternative Party (GR¨U)
• main language: German
5 20 35 50 65 80 95
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
Figure 4 : Voting intention time series for Austria
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 29/45
29
/45
Voting Intention Modelling — Evaluation
• 10-fold validation
−−− train a model using data based on a set of contiguous polls A
−−− test on the next D = 5 polls
−−− expand training set to {A ∪ D}, test on the next |D | = 5 polls
• realistic scenario: train on past, predict future polls
• overall we test predictions on 50 polls (in each case study)
Baselines
• Bµµµ: constant prediction based on µ(yyy) in the training set
• Blast: constant prediction based on last(yyy) in the training set
• LEN: (linear) Elastic Net prediction (using word frequencies)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 30/45
30
/45
Voting Intention Modelling — Performance tables
Average RMSEs on the voting intention percentage predictions in
the 10-step validation process
Table 1 : UK case study
CON LAB LIB µµµ
Bµµµ 2.272 1.663 1.136 1.69
Blast 2 2.074 1.095 1.723
LEN 3.845 2.912 2.445 3.067
BEN 1.939 1.644 1.136 1.573
BGL 1.7851.7851.785 1.5951.5951.595 1.0541.0541.054 1.4781.4781.478
Table 2 : Austrian case study
SP¨O ¨OVP FP¨O GR¨U µµµ
Bµµµ 1.535 1.373 3.3 1.197 1.851
Blast 1.1481.1481.148 1.556 1.6391.6391.639 1.536 1.47
LEN 1.291 1.286 2.039 1.1521.1521.152 1.442
BEN 1.392 1.31 2.89 1.205 1.699
BGL 1.619 1.0051.0051.005 1.757 1.374 1.4391.4391.439
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 31/45
31
/45
Voting Intention Modelling — Prediction figures
Polls BEN BGL
UK
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
Austria
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
Figure 5 : Performance figures for BEN and BGL in the UK/Austria case studies
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 32/45
32
/45
Voting Intention Modelling — Qualitative evaluation
Party Tweet Score Author
CON PM in friendly chat with top EU mate, Sweden’s Fredrik Reinfeldt, before
family photo
1.334 Journalist
Have Liberal Democrats broken electoral rules? Blog on Labour com-
plaint to cabinet secretary
−0.991 Journalist
LAB I am so pleased to hear Paul Savage who worked for the Labour group
has been Appointed the Marketing manager for the baths hall GREAT
NEWS
−0.552 Politician
(Labour)
LBD RT @user: Must be awful for TV bosses to keep getting knocked back
by all the women they ask to host election night (via @user)
0.874 LibDem
MP
SP¨O Inflationsrate in ¨O. im Juli leicht gesunken: von 2,2 auf 2,1%. Teurer
wurde Wohnen, Wasser, Energie.
Translation: Inflation rate in Austria slightly down in July from 2,2 to
2,1%. Accommodation, Water, Energy more expensive.
0.745 Journalist
¨OVP kann das buch “res publica” von johannes #voggenhuber wirklich
empfehlen! so zum nachdenken und so... #europa #demokratie
Translation: can really recommend the book “res publica” by johannes
#voggenhuber! Food for thought and so on #europe #democracy
−2.323 User
FP¨O Neue Kampagne der #Krone zur #Wehrpflicht: “GIB BELLO EINE
STIMME!”
Translation: New campaign by the #Krone on #Conscription: “GIVE
WOOFY A VOICE!”
7.44 Political
satire
GR¨U Protestsong gegen die Abschaffung des Bachelor-Studiums Interna-
tionale Entwicklung: <link> #IEbleibt #unibrennt #uniwut
Translation: Protest songs against the closing-down of the bachelor
course of International Development: <link> #IDremains #uniburns
#unirage
1.45 Student
Union
Table 3 : Scored tweet examples from both case studies using BGL
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 33/45
33
/45
Extracting Socioeconomic Patterns
from the News
(Lampos et al., 2014)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 34/45
34
/45
Socioeconomic Patterns — Data
News Summaries
• Open Europe Think Tank: summaries of news articles on EU or
member countries (focus on politics, perhaps right-wing biased!)
• from February 2006 to mid-November 2013
1913 days or 94 months or 888 years
• involving 435435435 international news outlets
• extracted 8, 413 unigrams and 19, 045 bigrams
Socioeconomic Indicators
• EU Economic Sentiment Indicator (ESI)
→ predictor for future economic developments (Gelper & Croux, 2010)
→ consists of 5 weighted confidence sub-indicators:
◦ industrial (40%), services (30%), consumer (20%)
construction (5%), retail trade (5%)
• EU Unemployment — seasonally adjusted ratio of the non
employed over the entire EU labour force
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 35/45
35
/45
Socioeconomic Patterns — Task description
+++ qualitative differences to voting intention modelling
◦ aim is NOT to predict socioeconomic indicators
◦ characterise news by conducting a supervised analysis on them driven
by socioeconomic factors
+++ use predictive performance as an informal guarantee that the
model is reasonable
+++ the better the predictive performance, the more trustful the
extracted patterns should be
Slightly modified BEN
argmin
ooo≥0,www,β
n
i=1
oooT
QQQiwww + β − yi
2
+ λo1 ooo 2
2
+ λo2 ooo 1
+ λw1 www 2
2
+ λw2 www 1
• min (ooo) ≥ 0 to enhance weight interpretability for both news
outlets and n-grams
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 36/45
36
/45
Socioeconomic Patterns — Predictive performance
• similar evaluation as in voting intention prediction
• differences: time frame is now a month, train using a moving
window of 64 contiguous months, test on the next 3 months
• make predictions for a total of 30 months
2007 2008 2009 2010 2011 2012 2013
0
50
100
actual
predictions
2007 2008 2009 2010 2011 2012 2013
0
5
10
actual
predictions
Figure 6 : Monthly rates of EU-wide ESI (right) and Unemployment (left)
together with BEN’s predictions for the last 30 months
ESI Unemployment
LEN 9.253 (9.89%) 0.9275 (8.75%)
BEN 8.2098.2098.209 (8.77%) 0.90470.90470.9047 (8.52%)
Table 4 : 10-fold validation average RMSEs (and error rates) for LEN and BEN
on ESI and unemployment rates prediction
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 37/45
37
/45
Socioeconomic Patterns — Qualitative analysis (ESI)
Frequency
Word
Outlet
Weight
a a
Polarity
Yes
Yes
+ -
Figure 7 : Visualisation of BEN’s outputs for EU’s ESI in the last fold (i.e. model
trained on 64 months up to August 2013). The word cloud depicts the top-60
positively and negatively weighted n-grams (120) in total together with the top-30
outlets.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 38/45
38
/45
Socioeconomic Patterns — Qualitative analysis (Unempl.)
Frequency
Word
Outlet
Weight
a a
Polarity
Yes
Yes
+ -
Figure 8 : Visualisation of BEN’s outputs for EU-Unemployment in the last fold
(i.e. model trained on 64 months up to August 2013). The word cloud depicts the
top-60 positively and negatively weighted n-grams (120) in total together with the
top-30 outlets.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 39/45
39
/45
Conclusions
+++ introduced a new class of methods for bilinear text regression
+++ directly applicable to Social Media content
+++ or other types of textual content such as news articles
+++ better predictive performance than the linear alternative (in the
investigated case studies)
+++ extended to bilinear multi-task learning
To do
−−− investigate finer grained modelling settings by applying different
regularisation functions (or different combinations of them)
−−− further understand the properties of bilinear versus linear text
regression, e.g. when and why is it a good choice or how different
combinations of regularisation settings affect performance
−−− task-specific improvements
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 40/45
40
/45
In collaboration with
Trevor Cohn, University of Melbourne
Daniel Preot¸iuc-Pietro, University of Pennsylvania
Sina Samangooei, Amazon Research
Douwe Gelling, University of Sheffield
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 41/45
41
/45
Thank you
Any questions?
Download the slides from
http://www.lampos.net/research/talks-posters
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 42/45
42
/45
References I
Al-Khayyal and Falk. Jointly Constrained Biconvex Programming. MOR, 1983.
Argyriou, Evgeniou and Pontil. Convex multi-task feature learning. Machine Learning,
2008.
Beck and Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear
Inverse Problems. J. Imaging Sci., 2009.
Bermingham and Smeaton. On using Twitter to monitor political sentiment and predict
election results. SAAIP, 2011.
Bollen, Mao and Zeng. Twitter mood predicts the stock market. JCS, 2011.
Caruana. Multitask Learning. Machine Learning, 1997.
Efron, Hastie, Johnstone and Tibshirani. Least Angle Regression. The Annals of
Statistics, 2004.
Gayo-Avello. A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter
Data. SSCR, 2013.
Gayo-Avello, Metaxas and Mustafaraj. Limits of Electoral Predictions using Twitter.
ICWSM, 2011.
Gelper and Croux. On the construction of the European Economic Sentiment Indicator.
OBES, 2010.
Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. 2009.
Hoerl and Kennard. Ridge regression: Biased estimation for nonorthogonal problems.
Technometrics, 1970.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 43/45
43
/45
References II
Horst and Tuy. Global Optimization: Deterministic Approaches. 1996.
Lampos and Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP,
2010.
Lampos and Cristianini. Nowcasting Events from the Social Web with Statistical
Learning. ACM TIST, 2012.
Lampos, Preot¸iuc-Pietro and Cohn. A user-centric model of voting intention from Social
Media. ACL, 2013.
Lampos, Preot¸iuc-Pietro, Samangooei, Gelling and Cohn. Extracting Socioeconomic
Patterns from the News: Modelling Text and Outlet Importance Jointly. ACL
LACSS, 2014.
Liu, Ji and Ye. Multi-task feature learning via efficient 2,12,12,1-norm minimization. UAI,
2009.
Mairal, Jenatton, Obozinski and Bach. Network Flow Algorithms for Structured Sparsity.
NIPS, 2010.
Metaxas, Mustafaraj and Gayo-Avello. How (not) to predict elections. SocialCom, 2011.
O’Connor, Balasubramanyan, Routledge and Smith. From Tweets to polls: Linking text
sentiment to public opinion time series. ICWSM, 2010.
Pirsiavash, Ramanan and Fowlkes. Bilinear classifiers for visual recognition. NIPS, 2009.
Quesada and Grossmann. A global optimization algorithm for linear fractional and
bilinear programs. JGO, 1995.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 44/45
44
/45
References III
Sakaki, Okazaki and Matsuo. Earthquake shakes Twitter users: real-time event
detection by social sensors. WWW, 2010.
Tausczik and Pennebaker. The Psychological Meaning of Words: LIWC and
Computerized Text Analysis Methods. JLSP, 2010.
Tibshirani. Regression Shrinkage and Selection via the LASSO. JRSS, 1996.
Tumasjan, Sprenger, Sandner and Welpe. Predicting elections with Twitter: What 140
characters reveal about political sentiment. ICWSM, 2010.
Yuan and Lin. Model selection and estimation in regression with grouped variables.
JRSS, 2006.
Zhao and Yu. On model selection consistency of LASSO. JMLR, 2006.
Zhou and Hastie. Regularization and variable selection via the elastic net. JRSS, 2005.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 45/45
45
/45

More Related Content

What's hot

Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...Toshiyuki Shimono
 
A Society of AI Agents
A Society of AI AgentsA Society of AI Agents
A Society of AI AgentsJun Wang
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Bayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationBayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationPreferred Networks
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2ybenjo
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...zukun
 
Convolution Neural Network
Convolution Neural NetworkConvolution Neural Network
Convolution Neural NetworkAmit Kushwaha
 
Thesis defense improved
Thesis defense improvedThesis defense improved
Thesis defense improvedZheng Mengdi
 

What's hot (12)

Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 
A Society of AI Agents
A Society of AI AgentsA Society of AI Agents
A Society of AI Agents
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
 
Bayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix FactorizationBayesian Dark Knowledge and Matrix Factorization
Bayesian Dark Knowledge and Matrix Factorization
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
 
Convolution Neural Network
Convolution Neural NetworkConvolution Neural Network
Convolution Neural Network
 
Thesis defense improved
Thesis defense improvedThesis defense improved
Thesis defense improved
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 

Viewers also liked

Focus group 27.09.2010 Sebastiano Lomuscio
Focus group 27.09.2010 Sebastiano LomuscioFocus group 27.09.2010 Sebastiano Lomuscio
Focus group 27.09.2010 Sebastiano LomuscioRoberto Galoppini
 
Mulherfortemulherdefor A
Mulherfortemulherdefor AMulherfortemulherdefor A
Mulherfortemulherdefor Acarolina
 
Launchpad_to_Quality
Launchpad_to_QualityLaunchpad_to_Quality
Launchpad_to_QualitySusi Doherty
 
Newsletter Lika Electronic Aprile 2016 IT
Newsletter Lika Electronic Aprile 2016 ITNewsletter Lika Electronic Aprile 2016 IT
Newsletter Lika Electronic Aprile 2016 ITLika Electronic
 
La mitat d'un quadrat
La mitat d'un quadratLa mitat d'un quadrat
La mitat d'un quadratPilar Otero
 
Introduction to tourism industry
Introduction to tourism industryIntroduction to tourism industry
Introduction to tourism industryGuruprasad Patel
 
Assoautomazione guida encoder
Assoautomazione guida encoderAssoautomazione guida encoder
Assoautomazione guida encoderLika Electronic
 
Communication
CommunicationCommunication
CommunicationUtchi
 
August White Paper 1/2015: Getting a Better Grip on External Spending
August White Paper 1/2015: Getting a Better Grip on External SpendingAugust White Paper 1/2015: Getting a Better Grip on External Spending
August White Paper 1/2015: Getting a Better Grip on External SpendingAugust Associates
 
Documentação técnica cerâmica contemporânea
Documentação técnica   cerâmica contemporâneaDocumentação técnica   cerâmica contemporânea
Documentação técnica cerâmica contemporâneaJoana
 
La sainte trinité du marketing web touristique
La sainte trinité du marketing web touristiqueLa sainte trinité du marketing web touristique
La sainte trinité du marketing web touristiqueFrederic Gonzalo
 
Présentation libqual qualité paris 24 juin 2013-francois mistral
Présentation libqual qualité paris 24 juin 2013-francois mistralPrésentation libqual qualité paris 24 juin 2013-francois mistral
Présentation libqual qualité paris 24 juin 2013-francois mistralAnita Beldiman-Moore de Sciences Po
 

Viewers also liked (20)

Dirección de recursos humanos
Dirección de recursos humanosDirección de recursos humanos
Dirección de recursos humanos
 
Focus group 27.09.2010 Sebastiano Lomuscio
Focus group 27.09.2010 Sebastiano LomuscioFocus group 27.09.2010 Sebastiano Lomuscio
Focus group 27.09.2010 Sebastiano Lomuscio
 
Mulherfortemulherdefor A
Mulherfortemulherdefor AMulherfortemulherdefor A
Mulherfortemulherdefor A
 
Mascotas
MascotasMascotas
Mascotas
 
Launchpad_to_Quality
Launchpad_to_QualityLaunchpad_to_Quality
Launchpad_to_Quality
 
Newsletter Lika Electronic Aprile 2016 IT
Newsletter Lika Electronic Aprile 2016 ITNewsletter Lika Electronic Aprile 2016 IT
Newsletter Lika Electronic Aprile 2016 IT
 
Indices 11 oct2013063504
Indices 11 oct2013063504Indices 11 oct2013063504
Indices 11 oct2013063504
 
Medias sociaux et relations publiques
Medias sociaux et relations publiquesMedias sociaux et relations publiques
Medias sociaux et relations publiques
 
La mitat d'un quadrat
La mitat d'un quadratLa mitat d'un quadrat
La mitat d'un quadrat
 
Introduction to tourism industry
Introduction to tourism industryIntroduction to tourism industry
Introduction to tourism industry
 
Assoautomazione guida encoder
Assoautomazione guida encoderAssoautomazione guida encoder
Assoautomazione guida encoder
 
Communication
CommunicationCommunication
Communication
 
Charles bargue :: Drawing Course
Charles bargue :: Drawing CourseCharles bargue :: Drawing Course
Charles bargue :: Drawing Course
 
August White Paper 1/2015: Getting a Better Grip on External Spending
August White Paper 1/2015: Getting a Better Grip on External SpendingAugust White Paper 1/2015: Getting a Better Grip on External Spending
August White Paper 1/2015: Getting a Better Grip on External Spending
 
Documentação técnica cerâmica contemporânea
Documentação técnica   cerâmica contemporâneaDocumentação técnica   cerâmica contemporânea
Documentação técnica cerâmica contemporânea
 
1erenominationjovenelmoise
1erenominationjovenelmoise1erenominationjovenelmoise
1erenominationjovenelmoise
 
American Revolution
American RevolutionAmerican Revolution
American Revolution
 
La sainte trinité du marketing web touristique
La sainte trinité du marketing web touristiqueLa sainte trinité du marketing web touristique
La sainte trinité du marketing web touristique
 
Présentation libqual qualité paris 24 juin 2013-francois mistral
Présentation libqual qualité paris 24 juin 2013-francois mistralPrésentation libqual qualité paris 24 juin 2013-francois mistral
Présentation libqual qualité paris 24 juin 2013-francois mistral
 
Como la minería afecta el medio ambiente
Como la minería afecta el medio ambienteComo la minería afecta el medio ambiente
Como la minería afecta el medio ambiente
 

Similar to Bilinear text regression and applications

Mining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentMining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentVasileios Lampos
 
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)Christian Ott
 
Isec2012 o hara
Isec2012 o haraIsec2012 o hara
Isec2012 o haraBob O'Hara
 
power system operation and control unit commitment .pdf
power system operation and control unit commitment .pdfpower system operation and control unit commitment .pdf
power system operation and control unit commitment .pdfArnabChakraborty499766
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDChristos Kallidonis
 
Reading papers - survey on Non-Convex Optimization
Reading papers - survey on Non-Convex OptimizationReading papers - survey on Non-Convex Optimization
Reading papers - survey on Non-Convex OptimizationX 37
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 
Introduction to core science models
Introduction to core science models Introduction to core science models
Introduction to core science models yingfeng
 
Value objects in JS - an ES7 work in progress
Value objects in JS - an ES7 work in progressValue objects in JS - an ES7 work in progress
Value objects in JS - an ES7 work in progressBrendan Eich
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureNecip Oguz Serbetci
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsSeonho Park
 
On Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsOn Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsPluribus One
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
Dealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanDealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanHiroki Itô
 

Similar to Bilinear text regression and applications (20)

Mining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentMining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media content
 
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)
 
Isec2012 o hara
Isec2012 o haraIsec2012 o hara
Isec2012 o hara
 
power system operation and control unit commitment .pdf
power system operation and control unit commitment .pdfpower system operation and control unit commitment .pdf
power system operation and control unit commitment .pdf
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
 
Reading papers - survey on Non-Convex Optimization
Reading papers - survey on Non-Convex OptimizationReading papers - survey on Non-Convex Optimization
Reading papers - survey on Non-Convex Optimization
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Introduction to core science models
Introduction to core science models Introduction to core science models
Introduction to core science models
 
Classification
ClassificationClassification
Classification
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
 
SocialCom 2013
SocialCom 2013SocialCom 2013
SocialCom 2013
 
Value objects in JS - an ES7 work in progress
Value objects in JS - an ES7 work in progressValue objects in JS - an ES7 work in progress
Value objects in JS - an ES7 work in progress
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Stochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithmsStochastic optimization from mirror descent to recent algorithms
Stochastic optimization from mirror descent to recent algorithms
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
On Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsOn Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial Settings
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
Dealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanDealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in Stan
 

More from Vasileios Lampos

Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Vasileios Lampos
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
Mining online data for public health surveillance
Mining online data for public health surveillanceMining online data for public health surveillance
Mining online data for public health surveillanceVasileios Lampos
 
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Vasileios Lampos
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...Vasileios Lampos
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataVasileios Lampos
 
Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Vasileios Lampos
 

More from Vasileios Lampos (8)

Quicksort
QuicksortQuicksort
Quicksort
 
Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
Mining online data for public health surveillance
Mining online data for public health surveillanceMining online data for public health surveillance
Mining online data for public health surveillance
 
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual data
 
Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 

Bilinear text regression and applications

  • 1. Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science University College London March, 2015 v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 1/45 1 /45
  • 2. Outline ⊥ Linear Regression Methods Bilinear Regression Methods Applications |= Conclusions v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 2/45 2 /45
  • 3. Recap on regression methods v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 3/45 3 /45
  • 4. Regression basics — Ordinary Least Squares (1/2) • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] Ordinary Least Squares (OLS) argmin www,β n i=1  yi − β − m j=1 xijwj   2 or in matrix form argmin www∗ XXX∗www∗ − yyy 2 2 , where XXX∗ = [XXX diag (III)] ⇒ www∗ = XXXT ∗ XXX∗ −1 XXXT ∗ yyy v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 4/45 4 /45
  • 5. Regression basics — Ordinary Least Squares (2/2) • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] Ordinary Least Squares (OLS) argmin www∗ XXX∗www∗ − yyy 2 2 ⇒ www∗ = XXXT ∗ XXX∗ −1 XXXT ∗ yyy Why not? −−− XXXT ∗ XXX∗ may be singular (thus difficult to invert) −−− high-dimensional models difficult to interpret −−− unsatisfactory prediction accuracy (estimates have large variance) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 5/45 5 /45
  • 6. Regression basics — Ridge Regression (1/2) • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] Ridge Regression (RR) www∗ = XXXT ∗ XXX∗ + λIII non singular −1 XXXT ∗ yyy (Hoerl & Kennard, 1970) argmin www,β    n i=1  yi − β − m j=1 xijwj   2 + λ m j=1 w2 j    or argmin www∗ XXX∗www∗ − yyy 2 2 + λ www 2 2 v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 6/45 6 /45
  • 7. Regression basics — Ridge Regression (2/2) • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] Ridge Regression (RR) argmin www∗ XXX∗www∗ − yyy 2 2 + λ www 2 2 +++ size constraint on the weight coefficients (regularisation) → resolves problems caused by collinear variables +++ less degrees of freedom, better predictive accuracy than OLS −−− does not perform feature selection (nonzero coefficients) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 7/45 7 /45
  • 8. Regression basics — Lasso • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] 111–norm regularisation or lasso (Tibshirani, 1996) argmin www,β    n i=1  yi − β − m j=1 xijwj   2 + λ m j=1 |wj|    or argmin www∗ XXX∗www∗ − yyy 2 2 + λ www 1 −−− no closed form solution — quadratic programming problem +++ Least Angle Regression explores entire reg. path (Efron et al., 2004) +++ sparse www, interpretability, better performance (Hastie et al., 2009) −−− if m > n, at most n variables can be selected −−− strongly corr. predictors → model-inconsistent (Zhao & Yu, 2009) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 8/45 8 /45
  • 9. Regression basics — Lasso for Text Regression • n-gram frequencies xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • flu rates yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] 111–norm regularisation or lasso or argmin www∗ XXX∗www∗ − yyy 2 2 + λ www 1 ‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ... 180 200 220 240 260 280 300 320 340 0 50 100 Day Number (2009) Flurate HPA Inferred 0 10 20 30 40 50 60 70 80 90 0 50 100 150 Days Flurate HPA Inferred A B C D E Figure 1 : Flu rate predictions for the UK by applying lasso on Twitter data (Lampos & Cristianini, 2010) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 9/45 9 /45
  • 10. Regression basics — Elastic Net • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] [Linear] Elastic Net (LEN) (Zhou & Hastie, 2005) argmin www∗    XXX∗www∗ − yyy 2 2 OLS + λ1 www 2 2 RR reg. + λ2 www 1 Lasso reg.    +++ ‘compromise’ between ridge regression (handles collinear predictors) and lasso (favours sparsity) +++ entire reg. path can be explored by modifying LAR +++ if m > n, number of selected variables not limited to n −−− may select redundant variables! v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 10/45 10 /45
  • 11. Would a slightly different text regression approach be more suitable for Social Media content? v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 11/45 11 /45
  • 12. About Twitter (1/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ The Twitter basics: • 140 characters per status (tweet) • users follow and be followed • embedded usage of topics (#elections) • retweets (RT), @replies, @mentions, favourites • real-time nature • biased user demographics v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 12/45 12 /45
  • 13. About Twitter (2/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ • contains a vast amount of information about various topics • this information (XXX) can be used to assist predictions (yyy) (Lampos & Cristianini, 2012; Sakaki et al., 2010; Bollen et al., 2011) −−− f : XXX → yyy, f usually formulates a linear regression task −−− XXX represents word frequencies only... +++ is it possible to incorporate a user contribution somehow? word selection +++ user selection v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 13/45 13 /45
  • 14. Bi-linear Text Regression v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 14/45 14 /45
  • 15. Bilinear Text Regression — The general idea (1/2) Linear regression: f (xxxi) = xxxT i www + β • observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β] Bilinear regression: f (QQQi) = uuuTQQQiwww + β • users p ∈ Z+ • observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β j ∈ {1, ..., m} v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 15/45 15 /45
  • 16. Bilinear Text Regression — The general idea (2/2) • users p ∈ Z+ • observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β j ∈ {1, ..., m} f (QQQi) = uuuTQQQiwww + β × × + β uuuT QQQi www v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 16/45 16 /45
  • 17. Bilinear Text Regression — Regularisation • users p ∈ Z+ • observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX • responses yi ∈ R, i ∈ {1, ..., n} — yyy • weights, bias uk, wj, β ∈ R, k ∈ {1, ..., p} — uuu, www, β j ∈ {1, ..., m} argmin uuu,www,β n i=1 uuuT QQQiwww + β − yi 2 + ψ(uuu, θu) + ψ(www, θw) ψ(·): regularisation function with a set of hyper-parameters (θ) • if ψ (vvv, λ) = λ vvv 1 Bilinear Lasso • if ψ (vvv, λ1, λ2) = λ1 vvv 2 2 + λ2 vvv 1 Bilinear Elastic Net (BEN) (Lampos et al., 2013) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 17/45 17 /45
  • 18. Bilinear Elastic Net (BEN) argmin uuu,www,β n i=1 uuuT QQQiwww + β − yi 2 BEN’s objective function + λu1 uuu 2 2 + λu2 uuu 1 + λw1 www 2 2 + λw2 www 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 0.4 0.8 1.2 1.6 2 2.4 Step Global Objective RMSE Figure 2 : Objective function value and RMSE (on hold-out data) through the model’s iterations • Bi-convexity: fix uuu, learn www and vv • Iterating through convex optimisation tasks: convergence (Al-Khayyal & Falk, 1983; Horst & Tuy, 1996) • FISTA (Beck & Teboulle, 2009) in SPAMS (Mairal et al., 2010): Large-scale optimisation solver, quick convergence v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 18/45 18 /45
  • 19. Multi-Task Learning v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 19/45 19 /45
  • 20. Multi-Task Learning What • Instead of learning/optimising a single task (one target variable) • ... optimise multiple tasks jointly Why (Caruana, 1997) • improves generalisation performance exploiting domain-specific information of related tasks • a good choice for under-sampled distributions — knowledge transfer • application-driven reasons (e.g. explore interplay between political parties) How • Multi-task regularised regression v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 20/45 20 /45
  • 21. The 2,12,12,1-norm regularisation WWW 2,1 = m j=1 WWWj 2 , where WWWj denotes the j-th row 2,12,12,1-norm regularisation argmin WWW,βββ    XXXWWW − YYY 2 F + λ m j=1 WWWj 2    • multi-task learning: instead of www ∈ Rm, learn WWW ∈ Rm×τ , where τ is the number of tasks • 2,1-norm regularisation, i.e. the sum of WWW’s row 2-norms (Argyriou et al., 2008; Liu et al., 2009) extends the notion of group lasso (Yuan & Lin, 2006) • group lasso: instead of single variables, selects groups of variables • ‘groups’ now become the τ-dimensional rows of WWW v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 21/45 21 /45
  • 22. Bilinear +++ Multi-Task Learning v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 22/45 22 /45
  • 23. Bilinear Multi-Task Learning • tasks τ ∈ Z+ • users p ∈ Z+ • observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX • responses yyyi ∈ Rτ , i ∈ {1, ..., n} — YYY • weights, bias uuuk,wwwj,βββ ∈ Rτ , k ∈ {1, ..., p} — UUU, WWW, βββ j ∈ {1, ..., m} f (QQQi) = tr UUUTQQQiWWW + βββ × × UUUT QQQi WWW v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 23/45 23 /45
  • 24. Bilinear Group 2,12,12,1 (BGL) (1/2) • tasks τ ∈ Z+ • users p ∈ Z+ • observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX • responses yyyi ∈ Rτ , i ∈ {1, ..., n} — YYY • weights, bias uuuk,wwwj,βββ ∈ Rτ , k ∈ {1, ..., p} — UUU, WWW, βββ j ∈ {1, ..., m} argmin UUU,WWW,βββ τ t=1 n i=1 uuuT t QQQiwwwt + βt − yti 2 + λu p k=1 UUUk 2 + λw m j=1 WWWj 2 • BGL can be broken into 2 convex tasks: first learn {WWW,βββ}, then {UUU,βββ} and vv + iterate through this process v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 24/45 24 /45
  • 25. Bilinear Group 2,12,12,1 (BGL) (2/2) argmin UUU,WWW,βββ τ t=1 n i=1 uuuT t QQQiwwwt + βt − yti 2 + λu p k=1 UUUk 2 + λw m j=1 WWWj 2 × × UUUT QQQi WWW • a feature (user/word) is selected for all tasks (not just one), but possibly with different weights • especially useful in the domain of politics (e.g. user pro party A, against party B) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 25/45 25 /45
  • 26. Voting Intention Modelling (Lampos et al., 2013) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 26/45 26 /45
  • 27. Political Opinion/Voting Intention Mining — Brief Recap Primary papers • predict the result of an election via Twitter (Tumasjan et al., 2010) • model socio-political sentiment polls (O’Connor et al., 2010) • above 2 failed on 2009 US congr. elections (Gayo-Avello, 2011) • desired properties of such models (Metaxas et al., 2011) Features • lexicon-based, e.g. using LIWC (Tausczik & Pennebaker, 2010) • task-specific keywords (names of parties, politicians) • tweet volume reviewed in (Gayo-Avello, 2013) −−− political descriptors change in time, differ per country −−− personalised modelling (present in actual polls) missing −−− multi-task learning? v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 27/45 27 /45
  • 28. Voting Intention Modelling — Data (United Kingdom) • 42K users distributed proportionally to regional population figures • 60m tweets from 30/04/2010 to 13/02/2012 • 80, 976 unigrams (word features) • 240 voting intention polls (YouGov) • 3 parties: Conservatives (CON), Labour Party (LAB), Liberal Democrats (LIB) • main language: English 5 30 55 80 105 130 155 180 205 230 0 5 10 15 20 25 30 35 40 45 VotingIntention% Time CON LAB LIB Figure 3 : Voting intention time series for the UK (YouGov) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 28/45 28 /45
  • 29. Voting Intention Modelling — Data (Austria) • 1.1K users manually selected by Austrian political analysts (SORA) • 800K tweets from 25/01 to 01/12/2012 • 22, 917 unigrams (word features) • 98 voting intention polls from various pollsters • 4 parties: Social Democratic Party (SP¨O), People’s Party (¨OVP), Freedom Party (FP¨O), Green Alternative Party (GR¨U) • main language: German 5 20 35 50 65 80 95 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ Figure 4 : Voting intention time series for Austria v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 29/45 29 /45
  • 30. Voting Intention Modelling — Evaluation • 10-fold validation −−− train a model using data based on a set of contiguous polls A −−− test on the next D = 5 polls −−− expand training set to {A ∪ D}, test on the next |D | = 5 polls • realistic scenario: train on past, predict future polls • overall we test predictions on 50 polls (in each case study) Baselines • Bµµµ: constant prediction based on µ(yyy) in the training set • Blast: constant prediction based on last(yyy) in the training set • LEN: (linear) Elastic Net prediction (using word frequencies) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 30/45 30 /45
  • 31. Voting Intention Modelling — Performance tables Average RMSEs on the voting intention percentage predictions in the 10-step validation process Table 1 : UK case study CON LAB LIB µµµ Bµµµ 2.272 1.663 1.136 1.69 Blast 2 2.074 1.095 1.723 LEN 3.845 2.912 2.445 3.067 BEN 1.939 1.644 1.136 1.573 BGL 1.7851.7851.785 1.5951.5951.595 1.0541.0541.054 1.4781.4781.478 Table 2 : Austrian case study SP¨O ¨OVP FP¨O GR¨U µµµ Bµµµ 1.535 1.373 3.3 1.197 1.851 Blast 1.1481.1481.148 1.556 1.6391.6391.639 1.536 1.47 LEN 1.291 1.286 2.039 1.1521.1521.152 1.442 BEN 1.392 1.31 2.89 1.205 1.699 BGL 1.619 1.0051.0051.005 1.757 1.374 1.4391.4391.439 v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 31/45 31 /45
  • 32. Voting Intention Modelling — Prediction figures Polls BEN BGL UK 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 VotingIntention% Time CON LAB LIB Austria 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 VotingIntention% Time SPÖ ÖVP FPÖ GRÜ Figure 5 : Performance figures for BEN and BGL in the UK/Austria case studies v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 32/45 32 /45
  • 33. Voting Intention Modelling — Qualitative evaluation Party Tweet Score Author CON PM in friendly chat with top EU mate, Sweden’s Fredrik Reinfeldt, before family photo 1.334 Journalist Have Liberal Democrats broken electoral rules? Blog on Labour com- plaint to cabinet secretary −0.991 Journalist LAB I am so pleased to hear Paul Savage who worked for the Labour group has been Appointed the Marketing manager for the baths hall GREAT NEWS −0.552 Politician (Labour) LBD RT @user: Must be awful for TV bosses to keep getting knocked back by all the women they ask to host election night (via @user) 0.874 LibDem MP SP¨O Inflationsrate in ¨O. im Juli leicht gesunken: von 2,2 auf 2,1%. Teurer wurde Wohnen, Wasser, Energie. Translation: Inflation rate in Austria slightly down in July from 2,2 to 2,1%. Accommodation, Water, Energy more expensive. 0.745 Journalist ¨OVP kann das buch “res publica” von johannes #voggenhuber wirklich empfehlen! so zum nachdenken und so... #europa #demokratie Translation: can really recommend the book “res publica” by johannes #voggenhuber! Food for thought and so on #europe #democracy −2.323 User FP¨O Neue Kampagne der #Krone zur #Wehrpflicht: “GIB BELLO EINE STIMME!” Translation: New campaign by the #Krone on #Conscription: “GIVE WOOFY A VOICE!” 7.44 Political satire GR¨U Protestsong gegen die Abschaffung des Bachelor-Studiums Interna- tionale Entwicklung: <link> #IEbleibt #unibrennt #uniwut Translation: Protest songs against the closing-down of the bachelor course of International Development: <link> #IDremains #uniburns #unirage 1.45 Student Union Table 3 : Scored tweet examples from both case studies using BGL v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 33/45 33 /45
  • 34. Extracting Socioeconomic Patterns from the News (Lampos et al., 2014) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 34/45 34 /45
  • 35. Socioeconomic Patterns — Data News Summaries • Open Europe Think Tank: summaries of news articles on EU or member countries (focus on politics, perhaps right-wing biased!) • from February 2006 to mid-November 2013 1913 days or 94 months or 888 years • involving 435435435 international news outlets • extracted 8, 413 unigrams and 19, 045 bigrams Socioeconomic Indicators • EU Economic Sentiment Indicator (ESI) → predictor for future economic developments (Gelper & Croux, 2010) → consists of 5 weighted confidence sub-indicators: ◦ industrial (40%), services (30%), consumer (20%) construction (5%), retail trade (5%) • EU Unemployment — seasonally adjusted ratio of the non employed over the entire EU labour force v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 35/45 35 /45
  • 36. Socioeconomic Patterns — Task description +++ qualitative differences to voting intention modelling ◦ aim is NOT to predict socioeconomic indicators ◦ characterise news by conducting a supervised analysis on them driven by socioeconomic factors +++ use predictive performance as an informal guarantee that the model is reasonable +++ the better the predictive performance, the more trustful the extracted patterns should be Slightly modified BEN argmin ooo≥0,www,β n i=1 oooT QQQiwww + β − yi 2 + λo1 ooo 2 2 + λo2 ooo 1 + λw1 www 2 2 + λw2 www 1 • min (ooo) ≥ 0 to enhance weight interpretability for both news outlets and n-grams v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 36/45 36 /45
  • 37. Socioeconomic Patterns — Predictive performance • similar evaluation as in voting intention prediction • differences: time frame is now a month, train using a moving window of 64 contiguous months, test on the next 3 months • make predictions for a total of 30 months 2007 2008 2009 2010 2011 2012 2013 0 50 100 actual predictions 2007 2008 2009 2010 2011 2012 2013 0 5 10 actual predictions Figure 6 : Monthly rates of EU-wide ESI (right) and Unemployment (left) together with BEN’s predictions for the last 30 months ESI Unemployment LEN 9.253 (9.89%) 0.9275 (8.75%) BEN 8.2098.2098.209 (8.77%) 0.90470.90470.9047 (8.52%) Table 4 : 10-fold validation average RMSEs (and error rates) for LEN and BEN on ESI and unemployment rates prediction v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 37/45 37 /45
  • 38. Socioeconomic Patterns — Qualitative analysis (ESI) Frequency Word Outlet Weight a a Polarity Yes Yes + - Figure 7 : Visualisation of BEN’s outputs for EU’s ESI in the last fold (i.e. model trained on 64 months up to August 2013). The word cloud depicts the top-60 positively and negatively weighted n-grams (120) in total together with the top-30 outlets. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 38/45 38 /45
  • 39. Socioeconomic Patterns — Qualitative analysis (Unempl.) Frequency Word Outlet Weight a a Polarity Yes Yes + - Figure 8 : Visualisation of BEN’s outputs for EU-Unemployment in the last fold (i.e. model trained on 64 months up to August 2013). The word cloud depicts the top-60 positively and negatively weighted n-grams (120) in total together with the top-30 outlets. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 39/45 39 /45
  • 40. Conclusions +++ introduced a new class of methods for bilinear text regression +++ directly applicable to Social Media content +++ or other types of textual content such as news articles +++ better predictive performance than the linear alternative (in the investigated case studies) +++ extended to bilinear multi-task learning To do −−− investigate finer grained modelling settings by applying different regularisation functions (or different combinations of them) −−− further understand the properties of bilinear versus linear text regression, e.g. when and why is it a good choice or how different combinations of regularisation settings affect performance −−− task-specific improvements v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 40/45 40 /45
  • 41. In collaboration with Trevor Cohn, University of Melbourne Daniel Preot¸iuc-Pietro, University of Pennsylvania Sina Samangooei, Amazon Research Douwe Gelling, University of Sheffield v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 41/45 41 /45
  • 42. Thank you Any questions? Download the slides from http://www.lampos.net/research/talks-posters v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 42/45 42 /45
  • 43. References I Al-Khayyal and Falk. Jointly Constrained Biconvex Programming. MOR, 1983. Argyriou, Evgeniou and Pontil. Convex multi-task feature learning. Machine Learning, 2008. Beck and Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. J. Imaging Sci., 2009. Bermingham and Smeaton. On using Twitter to monitor political sentiment and predict election results. SAAIP, 2011. Bollen, Mao and Zeng. Twitter mood predicts the stock market. JCS, 2011. Caruana. Multitask Learning. Machine Learning, 1997. Efron, Hastie, Johnstone and Tibshirani. Least Angle Regression. The Annals of Statistics, 2004. Gayo-Avello. A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data. SSCR, 2013. Gayo-Avello, Metaxas and Mustafaraj. Limits of Electoral Predictions using Twitter. ICWSM, 2011. Gelper and Croux. On the construction of the European Economic Sentiment Indicator. OBES, 2010. Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. 2009. Hoerl and Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 43/45 43 /45
  • 44. References II Horst and Tuy. Global Optimization: Deterministic Approaches. 1996. Lampos and Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP, 2010. Lampos and Cristianini. Nowcasting Events from the Social Web with Statistical Learning. ACM TIST, 2012. Lampos, Preot¸iuc-Pietro and Cohn. A user-centric model of voting intention from Social Media. ACL, 2013. Lampos, Preot¸iuc-Pietro, Samangooei, Gelling and Cohn. Extracting Socioeconomic Patterns from the News: Modelling Text and Outlet Importance Jointly. ACL LACSS, 2014. Liu, Ji and Ye. Multi-task feature learning via efficient 2,12,12,1-norm minimization. UAI, 2009. Mairal, Jenatton, Obozinski and Bach. Network Flow Algorithms for Structured Sparsity. NIPS, 2010. Metaxas, Mustafaraj and Gayo-Avello. How (not) to predict elections. SocialCom, 2011. O’Connor, Balasubramanyan, Routledge and Smith. From Tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 2010. Pirsiavash, Ramanan and Fowlkes. Bilinear classifiers for visual recognition. NIPS, 2009. Quesada and Grossmann. A global optimization algorithm for linear fractional and bilinear programs. JGO, 1995. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 44/45 44 /45
  • 45. References III Sakaki, Okazaki and Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW, 2010. Tausczik and Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. JLSP, 2010. Tibshirani. Regression Shrinkage and Selection via the LASSO. JRSS, 1996. Tumasjan, Sprenger, Sandner and Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM, 2010. Yuan and Lin. Model selection and estimation in regression with grouped variables. JRSS, 2006. Zhao and Yu. On model selection consistency of LASSO. JMLR, 2006. Zhou and Hastie. Regularization and variable selection via the elastic net. JRSS, 2005. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 45/45 45 /45