SlideShare a Scribd company logo
1 of 1
Download to read offline
Funded	by	the	EPSRC	IRC	project	“i-sense” Selected	References	
Lampos	et	al.	Predicting	and	Characterising	User	Impact	on	Twitter.	EACL,	2014.	
Preotiuc-Pietro	et	al.	An	analysis	of	the	user	occupational	class	through	Twitter	content.	ACL,	2015.	
Preotiuc-Pietro	et	al.	Studying	User	Income	through	Language,	Behaviour	and	Affect	in	Social	Media.	PLoS	ONE,	2015.	
Rasmussen	and	Williams.	Gaussian	Processes	for	Machine	Learning.	MIT	Press,	2006.
Download	the	data	set
Summary.	 We	 present	 a	 method	 for	 determining	 the	
socioeconomic	 status	 of	 a	 social	 media	 (Twitter)	 user.	
Initially,	 we	 formulate	 a	 3-way	 classification	 task,	 where	
users	 are	 classified	 as	 having	 an	 upper,	 middle	 or	 lower	
socioeconomic	status.	A	nonlinear	learning	approach	using	
a	 composite	 Gaussian	 Process	 kernel	 provides	 a	
classification	 accuracy	 of	 75%.	 By	 turning	 this	 task	 into	 a	
binary	classification	–	upper	vs.	medium	and	lower	class	–	
the	proposed	classifier	reaches	an	accuracy	of	82%.
Profile	description	
football	player	at	
Liverpool	FC	
tweets	from	the	best	
barista	in	London	
estate	agent,	stamp	
collector	&	proud	mother	
(523	1-grams	&	2-grams)
Behaviour	
%	re-tweets

%	@mentions

%	unique	@mentions	
%	@replies
Impact	
#	followers

#	followees

#	listed

‘impact’	score
Topics	of	
discussion	
Corporate

Education

Internet	Slang

Politics

Shopping

Sports

Vacation

…	
(200	topics)
Text	in	posts	
(560	1-grams)
c1
c2 c3
c4
c5
Twitter	user	attributes	(feature	categories)
A
How	is	a	user	profile	mapped	to	a	socioeconomic	status?
Profile	description	
on	Twitter	
Occupation SOC	category1 NS-SEC2
1. Standard	Occupational	Classification:	369	job	groupings	
2. National	Statistics	Socio-Economic	Classification:	Map	from	
the	 job	 groupings	 in	 SOC	 to	 a	 socioeconomic	 status,	 i.e.	
{upper,	middle	or	lower}
B
Data	sets
T1:	1,342	Twitter	user	profiles,	2	million	tweets,	from	February	
1,	 2014	 to	 March	 21,	 2015;	 profiles	 are	 labelled	 with	 a	 socio-
economic	status	
T2:	160	million	tweets,	sample	of	UK	Twitter,	same	date	range	
with	T1,	used	to	learn	a	set	of	200	latent	topics
C
1-gram samples from a subset of the 200 latent topics (word clusters) ex-
utomatically from Twitter data (D2).
ic Sample of 1-grams
rate #business, clients, development, marketing, o ces, product
tion assignments, coursework, dissertation, essay, library, notes, studies
ily #family, auntie, dad, family, mother, nephew, sister, uncle
Slang ahahaha, awwww, hahaa, hahahaha, hmmmm, loooool, oooo, yay
ics #labour, #politics, #tories, conservatives, democracy, voters
ping #shopping, asda, bargain, customers, market, retail, shops, toys
rts #football, #winner, ball, bench, defending, footballer, goal, won
rtime #beach, #sea, #summer, #sunshine, bbq, hot, seaside, swimming
rism #jesuischarlie, cartoon, freedom, religion, shootings, terrorism
ams) and 560 (1-grams) respectively. Thus, a Twitter user in our data
resented by a 1, 291-dimensional feature vector.
pplied spectral clustering [12] on D2 to derive 200 (hard) clusters of
that capture a number of latent topics and linguistic expressions (e.g.
, ‘Sports’, ‘Internet Slang’), a snapshot of which is presented in Ta-
evious research has shown that this amount of clusters is adequate for
g a strong performance in similar tasks [7,13,14]. We then computed the
y of each topic in the tweets of D1 as described in feature category c5.
btain a SES label for each user account, we took advantage of the SOC
y’s characteristics [5]. In SOC, jobs are categorised based on the required
l and specialisation. At the top level, there exist 9 general occupation
and the scheme breaks down to sub-categories forming a 4-level struc-
e bottom of this hierarchy contains more specific job groupings (369 in
OC also provides a simplified mapping from these job groupings to a
efined by NS-SEC [17]. We used this mapping to assign an upper, mid-
wer SES to each user account in our data set. This process resulted in
and 314 users in the upper, middle and lower SES classes, respectively.2
assification Methods
a composite Gaussian Process (GP), described below, as our main
for performing classification. GPs can be defined as sets of random
, any finite number of which have a multivariate Gaussian distribution
mally, GP methods aim to learn a function f : Rd
! R drawn from a
given the inputs x 2 Rd
:
f(x) ⇠ GP(m(x), k(x, x0
)) , (1)
(·) is the mean function (here set equal to 0) and k(·, ·) is the covari-
nel. We apply the squared exponential (SE) kernel, also known as the
ta set is available at http://dx.doi.org/10.6084/m9.figshare.1619703.
subset of the 200 latent topics (word clusters) ex-
r data (D2).
Sample of 1-grams
ents, development, marketing, o ces, product
rsework, dissertation, essay, library, notes, studies
tie, dad, family, mother, nephew, sister, uncle
w, hahaa, hahahaha, hmmmm, loooool, oooo, yay
itics, #tories, conservatives, democracy, voters
a, bargain, customers, market, retail, shops, toys
ner, ball, bench, defending, footballer, goal, won
summer, #sunshine, bbq, hot, seaside, swimming
cartoon, freedom, religion, shootings, terrorism
) respectively. Thus, a Twitter user in our data
mensional feature vector.
ng [12] on D2 to derive 200 (hard) clusters of
of latent topics and linguistic expressions (e.g.
ng’), a snapshot of which is presented in Ta-
wn that this amount of clusters is adequate for
n similar tasks [7,13,14]. We then computed the
weets of D1 as described in feature category c5.
ch user account, we took advantage of the SOC
SOC, jobs are categorised based on the required
the top level, there exist 9 general occupation
down to sub-categories forming a 4-level struc-
hy contains more specific job groupings (369 in
plified mapping from these job groupings to a
We used this mapping to assign an upper, mid-
ccount in our data set. This process resulted in
per, middle and lower SES classes, respectively.2
ds
Process (GP), described below, as our main
ation. GPs can be defined as sets of random
which have a multivariate Gaussian distribution
to learn a function f : Rd
! R drawn from a
d
:
⇠ GP(m(x), k(x, x0
)) , (1)
n (here set equal to 0) and k(·, ·) is the covari-
red exponential (SE) kernel, also known as the
://dx.doi.org/10.6084/m9.figshare.1619703.
Table 1. 1-gram samples from a subset of the 200 latent topics (word clusters) ex-
tracted automatically from Twitter data (D2).
Topic Sample of 1-grams
Corporate #business, clients, development, marketing, o ces, product
Education assignments, coursework, dissertation, essay, library, notes, studies
Family #family, auntie, dad, family, mother, nephew, sister, uncle
Internet Slang ahahaha, awwww, hahaa, hahahaha, hmmmm, loooool, oooo, yay
Politics #labour, #politics, #tories, conservatives, democracy, voters
Shopping #shopping, asda, bargain, customers, market, retail, shops, toys
Sports #football, #winner, ball, bench, defending, footballer, goal, won
Summertime #beach, #sea, #summer, #sunshine, bbq, hot, seaside, swimming
Terrorism #jesuischarlie, cartoon, freedom, religion, shootings, terrorism
plus 2-grams) and 560 (1-grams) respectively. Thus, a Twitter user in our data
set is represented by a 1, 291-dimensional feature vector.
We applied spectral clustering [12] on D2 to derive 200 (hard) clusters of
1-grams that capture a number of latent topics and linguistic expressions (e.g.
‘Politics’, ‘Sports’, ‘Internet Slang’), a snapshot of which is presented in Ta-
ble 1. Previous research has shown that this amount of clusters is adequate for
achieving a strong performance in similar tasks [7,13,14]. We then computed the
frequency of each topic in the tweets of D1 as described in feature category c5.
To obtain a SES label for each user account, we took advantage of the SOC
hierarchy’s characteristics [5]. In SOC, jobs are categorised based on the required
skill level and specialisation. At the top level, there exist 9 general occupation
groups, and the scheme breaks down to sub-categories forming a 4-level struc-
ture. The bottom of this hierarchy contains more specific job groupings (369 in
total). SOC also provides a simplified mapping from these job groupings to a
SES as defined by NS-SEC [17]. We used this mapping to assign an upper, mid-
dle or lower SES to each user account in our data set. This process resulted in
710, 318 and 314 users in the upper, middle and lower SES classes, respectively.2
3 Classification Methods
We use a composite Gaussian Process (GP), described below, as our main
method for performing classification. GPs can be defined as sets of random
variables, any finite number of which have a multivariate Gaussian distribution
[16]. Formally, GP methods aim to learn a function f : Rd
! R drawn from a
GP prior given the inputs x 2 Rd
:
f(x) ⇠ GP(m(x), k(x, x0
)) , (1)
where m(·) is the mean function (here set equal to 0) and k(·, ·) is the covari-
ance kernel. We apply the squared exponential (SE) kernel, also known as the
2
The data set is available at http://dx.doi.org/10.6084/m9.figshare.1619703.
Definition:
Kernel	
formulation:
ation mean performance as estimated via a 10-fold cross valida-
GP classifier for both problem specifications. Parentheses hold
timate.
Accuracy Precision Recall F-score
5.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049)
2.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025)
(RBF), defined as kSE(x, x0
) = ✓2
exp kx x0
k2
2/(2`2
) ,
nt that describes the overall level of variance and ` is re-
acteristic length-scale parameter. Note that ` is inversely
redictive relevancy of x (high values indicate a low degree
classification using GPs ‘squashes’ the real valued latent
through a logistic function: ⇡(x) , P(y = 1|x) = (f(x))
ogistic regression classification. In binary classification, the
latent f⇤ is combined with the logistic function to produceR
(f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a
od and thus, the model parameters can only be estimated.
use the Laplace approximation [16,18].
Table 2. SES classification mean performance as estimated via a 10-fold cross valida-
tion of the composite GP classifier for both problem specifications. Parentheses hold
the SD of the mean estimate.
Num. of classes Accuracy Precision Recall F-score
3 75.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049)
2 82.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025)
radial basis function (RBF), defined as kSE(x, x0
) = ✓2
exp kx x0
k2
2/(2`2
) ,
where ✓2
is a constant that describes the overall level of variance and ` is re-
ferred to as the characteristic length-scale parameter. Note that ` is inversely
proportional to the predictive relevancy of x (high values indicate a low degree
of relevance). Binary classification using GPs ‘squashes’ the real valued latent
function f(x) output through a logistic function: ⇡(x) , P(y = 1|x) = (f(x))
in a similar way to logistic regression classification. In binary classification, the
distribution over the latent f⇤ is combined with the logistic function to produce
the prediction ¯⇡⇤ =
R
(f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a
non-Gaussian likelihood and thus, the model parameters can only be estimated.
For this purpose we use the Laplace approximation [16,18].
Based on the property that the sum of covariance functions is also a valid
covariance function [16], we model the di↵erent user feature categories with a
di↵erent SE kernel. The final covariance function, therefore, becomes
k(x, x0
) =
CX
n=1
kSE(cn, c0
n)
!
+ kN(x, x0
) , (2)
where cn is used to express the features of each category, i.e., x = {c1, . . . , cC,},
C is equal to the number of feature categories (in our experimental setup, C = 5)
and kN(x, x0
) = ✓2
N ⇥ (x, x0
) models noise ( being a Kronecker delta func-
tion). Similar GP kernel formulations have been applied for text regression tasks
[7,9,11] as a way of capturing groupings of the feature space more e↵ectively.
Although related work has indicated the superiority of nonlinear approaches
in similar multimodal tasks [7,14], we also estimate a performance baseline us-
ing a linear method. Given the high dimensionality of our task, we apply logistic
regression with elastic net regularisation [6] for this purpose. As both classifica-
tion techniques can address binary tasks, we adopt the one–vs.–all strategy for
conducting an inference.
Table 2. SES classification mean performance as estimated via a 10-fold cross valida-
tion of the composite GP classifier for both problem specifications. Parentheses hold
the SD of the mean estimate.
Num. of classes Accuracy Precision Recall F-score
3 75.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049)
2 82.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025)
radial basis function (RBF), defined as kSE(x, x0
) = ✓2
exp kx x0
k2
2/(2`2
) ,
where ✓2
is a constant that describes the overall level of variance and ` is re-
ferred to as the characteristic length-scale parameter. Note that ` is inversely
proportional to the predictive relevancy of x (high values indicate a low degree
of relevance). Binary classification using GPs ‘squashes’ the real valued latent
function f(x) output through a logistic function: ⇡(x) , P(y = 1|x) = (f(x))
in a similar way to logistic regression classification. In binary classification, the
distribution over the latent f⇤ is combined with the logistic function to produce
the prediction ¯⇡⇤ =
R
(f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a
non-Gaussian likelihood and thus, the model parameters can only be estimated.
For this purpose we use the Laplace approximation [16,18].
Based on the property that the sum of covariance functions is also a valid
covariance function [16], we model the di↵erent user feature categories with a
di↵erent SE kernel. The final covariance function, therefore, becomes
k(x, x0
) =
CX
n=1
kSE(cn, c0
n)
!
+ kN(x, x0
) , (2)
where cn is used to express the features of each category, i.e., x = {c1, . . . , cC,},
C is equal to the number of feature categories (in our experimental setup, C = 5)
and kN(x, x0
) = ✓2
N ⇥ (x, x0
) models noise ( being a Kronecker delta func-
tion). Similar GP kernel formulations have been applied for text regression tasks
[7,9,11] as a way of capturing groupings of the feature space more e↵ectively.
Although related work has indicated the superiority of nonlinear approaches
in similar multimodal tasks [7,14], we also estimate a performance baseline us-
ing a linear method. Given the high dimensionality of our task, we apply logistic
regression with elastic net regularisation [6] for this purpose. As both classifica-
tion techniques can address binary tasks, we adopt the one–vs.–all strategy for
conducting an inference.
ce as estimated via a 10-fold cross valida-
problem specifications. Parentheses hold
ecision Recall F-score
% (4.40%) 70.76% (5.65%) .714 (.049)
% (2.39%) 81.97% (2.55%) .821 (.025)
SE(x, x0
) = ✓2
exp kx x0
k2
2/(2`2
) ,
e overall level of variance and ` is re-
le parameter. Note that ` is inversely
of x (high values indicate a low degree
GPs ‘squashes’ the real valued latent
unction: ⇡(x) , P(y = 1|x) = (f(x))
ssification. In binary classification, the
d with the logistic function to produce
)df⇤. The posterior formulation has a
odel parameters can only be estimated.
roximation [16,18].
of covariance functions is also a valid
di↵erent user feature categories with a
function, therefore, becomes
n, c0
n)
!
+ kN(x, x0
) , (2)
each category, i.e., x = {c1, . . . , cC}, C
ies (in our experimental setup, C = 5)
oise ( being a Kronecker delta func-
e been applied for text regression tasks
of the feature space more e↵ectively.
the superiority of nonlinear approaches
so estimate a performance baseline us-
nsionality of our task, we apply logistic
[6] for this purpose. As both classifica-
, we adopt the one–vs.–all strategy for
erformance as estimated via a 10-fold cross valida-
for both problem specifications. Parentheses hold
Precision Recall F-score
) 72.04% (4.40%) 70.76% (5.65%) .714 (.049)
) 82.20% (2.39%) 81.97% (2.55%) .821 (.025)
ned as kSE(x, x0
) = ✓2
exp kx x0
k2
2/(2`2
) ,
ribes the overall level of variance and ` is re-
ngth-scale parameter. Note that ` is inversely
evancy of x (high values indicate a low degree
n using GPs ‘squashes’ the real valued latent
ogistic function: ⇡(x) , P(y = 1|x) = (f(x))
sion classification. In binary classification, the
combined with the logistic function to produce
|x, y, x⇤)df⇤. The posterior formulation has a
, the model parameters can only be estimated.
ace approximation [16,18].
he sum of covariance functions is also a valid
el the di↵erent user feature categories with a
ariance function, therefore, becomes
X
1
kSE(cn, c0
n)
!
+ kN(x, x0
) , (2)
tures of each category, i.e., x = {c1, . . . , cC}, C
categories (in our experimental setup, C = 5)
odels noise ( being a Kronecker delta func-
ons have been applied for text regression tasks
upings of the feature space more e↵ectively.
dicated the superiority of nonlinear approaches
], we also estimate a performance baseline us-
gh dimensionality of our task, we apply logistic
isation [6] for this purpose. As both classifica-
y tasks, we adopt the one–vs.–all strategy for
where
Formulating	a	Gaussian	Process	classifier E
Topics	(word	clusters)	are	formed	by	applying	

spectral	clustering	on	daily	word	frequencies	in	T2.	
Examples	of	topics	with	word	samples	
Corporate:	#business,	clients,	development,	marketing,	offices	
Education:	assignments,	coursework,	dissertation,	essay,	library	
Internet	Slang:	ahahaha,	awwww,	hahaa,	hahahaha,	hmmmm	
Politics:	#labour,	#politics,	#tories,	conservatives,	democracy	
Shopping:	#shopping,	asda,	bargain,	customers,	market,	retail	
Sports:	#football,	#winner,	ball,	bench,	defending,	footballer
D
Classification Accuracy	(%) Precision	(%) Recall	(%) F1
2-way 82.05	(2.4) 82.2	(2.4) 81.97	(2.6) .821	(.03)
3-way 75.09	(3.3) 72.04	(4.4) 70.76	(5.7) .714	(.05)
Classification	performance	(10-fold	CV)
T1 T2 P
O1 584 115 83.5%
O2 126 517 80.4%
R 82.3% 81.8% 82.0%
T1 T2 T3 P
O1 606 84 53 81.6%
O2 49 186 45 66.4%
O3 55 48 216 67.7%
R 854% 58.5% 68.8% 75.1%
Confusion	matrices	(aggregate)
O	=	output	(inferred),	T	=	target,	P	=	precision,	R	=	recall	
{1,	2,	3}	=	{upper,	middle,	lower}	socioeconomic	status
F
Conclusions.	(a)	First	approach	for	inferring	the	socioeconomic	
status	of	a	social	media	user,	(b)	75%	&	82%	accuracy	for	the	3-
way	and	binary	classification	tasks	respectively,	and	(c)	future	
work	 is	 required	 to	 evaluate	 this	 framework	 more	 rigorously	
and	to	analyse	underlying	qualitative	properties	in	detail.
Inferring	the	Socioeconomic	Status	of	Social	
Media	Users	based	on	Behaviour	&	Language	
Vasileios	Lampos,	Nikolaos	Aletras,		
Jens	K.	Geyti,	Bin	Zou	&	Ingemar	J.	Cox

More Related Content

What's hot

Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
MemFunc.doc
MemFunc.docMemFunc.doc
MemFunc.docbutest
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHEScscpconf
 
Image pcm 09
Image pcm 09Image pcm 09
Image pcm 09anilbl
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingNagarjun Kotyada
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicoreillidan2004
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 

What's hot (11)

Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
MemFunc.doc
MemFunc.docMemFunc.doc
MemFunc.doc
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
 
Image pcm 09
Image pcm 09Image pcm 09
Image pcm 09
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 

Viewers also liked

A B C Media Report Al Jazeera
A B C  Media Report  Al JazeeraA B C  Media Report  Al Jazeera
A B C Media Report Al JazeeraSimon Gummer
 
Socio Economic Status Presentation
Socio Economic Status PresentationSocio Economic Status Presentation
Socio Economic Status Presentationmldunning
 
Socioeconomic status
Socioeconomic statusSocioeconomic status
Socioeconomic statusrhondak84
 
Socio economic classification
Socio economic classificationSocio economic classification
Socio economic classificationhari krishnan.n
 
How media producers define their target audience
How media producers define their target audienceHow media producers define their target audience
How media producers define their target audiencemattwako
 
Representation of gender and stereotypes
Representation of gender and stereotypesRepresentation of gender and stereotypes
Representation of gender and stereotypesLiz Davies
 
How media producers define their target audience
How media producers define their target audienceHow media producers define their target audience
How media producers define their target audienceCharlotte Jean
 

Viewers also liked (11)

A B C Media Report Al Jazeera
A B C  Media Report  Al JazeeraA B C  Media Report  Al Jazeera
A B C Media Report Al Jazeera
 
ABC powerpoint:)
ABC powerpoint:)ABC powerpoint:)
ABC powerpoint:)
 
Socio Economic Status Presentation
Socio Economic Status PresentationSocio Economic Status Presentation
Socio Economic Status Presentation
 
Socioeconomic status
Socioeconomic statusSocioeconomic status
Socioeconomic status
 
Socio economic classification
Socio economic classificationSocio economic classification
Socio economic classification
 
How media producers define their target audience
How media producers define their target audienceHow media producers define their target audience
How media producers define their target audience
 
Representation of gender and stereotypes
Representation of gender and stereotypesRepresentation of gender and stereotypes
Representation of gender and stereotypes
 
How media producers define their target audience
How media producers define their target audienceHow media producers define their target audience
How media producers define their target audience
 
2016 Digital Yearbook
2016 Digital Yearbook2016 Digital Yearbook
2016 Digital Yearbook
 
Digital in 2017 Global Overview
Digital in 2017 Global OverviewDigital in 2017 Global Overview
Digital in 2017 Global Overview
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Similar to Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language

INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Multiclass Recognition with Multiple Feature Trees
Multiclass Recognition with Multiple Feature TreesMulticlass Recognition with Multiple Feature Trees
Multiclass Recognition with Multiple Feature Treescsandit
 
Multiclass recognition with
Multiclass recognition withMulticlass recognition with
Multiclass recognition withcsandit
 
cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-finalDarren Koh
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
Introduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxIntroduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxline24arts
 
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Waqas Tariq
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationIJECEIAES
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
 
Stamps extraction using local adaptive k-means and ISODATA algorithms
Stamps extraction using local adaptive k-means and ISODATA algorithmsStamps extraction using local adaptive k-means and ISODATA algorithms
Stamps extraction using local adaptive k-means and ISODATA algorithmsnooriasukmaningtyas
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdfSudhanshiBakre1
 

Similar to Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language (20)

INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
StiglicM-poster-ang
StiglicM-poster-angStiglicM-poster-ang
StiglicM-poster-ang
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Multiclass Recognition with Multiple Feature Trees
Multiclass Recognition with Multiple Feature TreesMulticlass Recognition with Multiple Feature Trees
Multiclass Recognition with Multiple Feature Trees
 
Multiclass recognition with
Multiclass recognition withMulticlass recognition with
Multiclass recognition with
 
cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-final
 
E1802023741
E1802023741E1802023741
E1802023741
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Data Science Machine
Data Science Machine Data Science Machine
Data Science Machine
 
Introduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxIntroduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptx
 
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity Propagation
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
 
Stamps extraction using local adaptive k-means and ISODATA algorithms
Stamps extraction using local adaptive k-means and ISODATA algorithmsStamps extraction using local adaptive k-means and ISODATA algorithms
Stamps extraction using local adaptive k-means and ISODATA algorithms
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 

More from Vasileios Lampos

Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Vasileios Lampos
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
Mining online data for public health surveillance
Mining online data for public health surveillanceMining online data for public health surveillance
Mining online data for public health surveillanceVasileios Lampos
 
Mining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentMining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentVasileios Lampos
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...Vasileios Lampos
 
User-generated content: collective and personalised inference tasks
User-generated content: collective and personalised inference tasksUser-generated content: collective and personalised inference tasks
User-generated content: collective and personalised inference tasksVasileios Lampos
 
Bilinear text regression and applications
Bilinear text regression and applicationsBilinear text regression and applications
Bilinear text regression and applicationsVasileios Lampos
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataVasileios Lampos
 
Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Vasileios Lampos
 

More from Vasileios Lampos (10)

Quicksort
QuicksortQuicksort
Quicksort
 
Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...Transfer learning for unsupervised influenza-like illness models from online ...
Transfer learning for unsupervised influenza-like illness models from online ...
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
Mining online data for public health surveillance
Mining online data for public health surveillanceMining online data for public health surveillance
Mining online data for public health surveillance
 
Mining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media contentMining socio-political and socio-economic signals from social media content
Mining socio-political and socio-economic signals from social media content
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...
 
User-generated content: collective and personalised inference tasks
User-generated content: collective and personalised inference tasksUser-generated content: collective and personalised inference tasks
User-generated content: collective and personalised inference tasks
 
Bilinear text regression and applications
Bilinear text regression and applicationsBilinear text regression and applications
Bilinear text regression and applications
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual data
 
Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...Assessing the impact of a health intervention via user-generated Internet con...
Assessing the impact of a health intervention via user-generated Internet con...
 

Recently uploaded

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 

Recently uploaded (20)

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 

Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language

  • 1. Funded by the EPSRC IRC project “i-sense” Selected References Lampos et al. Predicting and Characterising User Impact on Twitter. EACL, 2014. Preotiuc-Pietro et al. An analysis of the user occupational class through Twitter content. ACL, 2015. Preotiuc-Pietro et al. Studying User Income through Language, Behaviour and Affect in Social Media. PLoS ONE, 2015. Rasmussen and Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. Download the data set Summary. We present a method for determining the socioeconomic status of a social media (Twitter) user. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear learning approach using a composite Gaussian Process kernel provides a classification accuracy of 75%. By turning this task into a binary classification – upper vs. medium and lower class – the proposed classifier reaches an accuracy of 82%. Profile description football player at Liverpool FC tweets from the best barista in London estate agent, stamp collector & proud mother (523 1-grams & 2-grams) Behaviour % re-tweets
 % @mentions
 % unique @mentions % @replies Impact # followers
 # followees
 # listed
 ‘impact’ score Topics of discussion Corporate
 Education
 Internet Slang
 Politics
 Shopping
 Sports
 Vacation
 … (200 topics) Text in posts (560 1-grams) c1 c2 c3 c4 c5 Twitter user attributes (feature categories) A How is a user profile mapped to a socioeconomic status? Profile description on Twitter Occupation SOC category1 NS-SEC2 1. Standard Occupational Classification: 369 job groupings 2. National Statistics Socio-Economic Classification: Map from the job groupings in SOC to a socioeconomic status, i.e. {upper, middle or lower} B Data sets T1: 1,342 Twitter user profiles, 2 million tweets, from February 1, 2014 to March 21, 2015; profiles are labelled with a socio- economic status T2: 160 million tweets, sample of UK Twitter, same date range with T1, used to learn a set of 200 latent topics C 1-gram samples from a subset of the 200 latent topics (word clusters) ex- utomatically from Twitter data (D2). ic Sample of 1-grams rate #business, clients, development, marketing, o ces, product tion assignments, coursework, dissertation, essay, library, notes, studies ily #family, auntie, dad, family, mother, nephew, sister, uncle Slang ahahaha, awwww, hahaa, hahahaha, hmmmm, loooool, oooo, yay ics #labour, #politics, #tories, conservatives, democracy, voters ping #shopping, asda, bargain, customers, market, retail, shops, toys rts #football, #winner, ball, bench, defending, footballer, goal, won rtime #beach, #sea, #summer, #sunshine, bbq, hot, seaside, swimming rism #jesuischarlie, cartoon, freedom, religion, shootings, terrorism ams) and 560 (1-grams) respectively. Thus, a Twitter user in our data resented by a 1, 291-dimensional feature vector. pplied spectral clustering [12] on D2 to derive 200 (hard) clusters of that capture a number of latent topics and linguistic expressions (e.g. , ‘Sports’, ‘Internet Slang’), a snapshot of which is presented in Ta- evious research has shown that this amount of clusters is adequate for g a strong performance in similar tasks [7,13,14]. We then computed the y of each topic in the tweets of D1 as described in feature category c5. btain a SES label for each user account, we took advantage of the SOC y’s characteristics [5]. In SOC, jobs are categorised based on the required l and specialisation. At the top level, there exist 9 general occupation and the scheme breaks down to sub-categories forming a 4-level struc- e bottom of this hierarchy contains more specific job groupings (369 in OC also provides a simplified mapping from these job groupings to a efined by NS-SEC [17]. We used this mapping to assign an upper, mid- wer SES to each user account in our data set. This process resulted in and 314 users in the upper, middle and lower SES classes, respectively.2 assification Methods a composite Gaussian Process (GP), described below, as our main for performing classification. GPs can be defined as sets of random , any finite number of which have a multivariate Gaussian distribution mally, GP methods aim to learn a function f : Rd ! R drawn from a given the inputs x 2 Rd : f(x) ⇠ GP(m(x), k(x, x0 )) , (1) (·) is the mean function (here set equal to 0) and k(·, ·) is the covari- nel. We apply the squared exponential (SE) kernel, also known as the ta set is available at http://dx.doi.org/10.6084/m9.figshare.1619703. subset of the 200 latent topics (word clusters) ex- r data (D2). Sample of 1-grams ents, development, marketing, o ces, product rsework, dissertation, essay, library, notes, studies tie, dad, family, mother, nephew, sister, uncle w, hahaa, hahahaha, hmmmm, loooool, oooo, yay itics, #tories, conservatives, democracy, voters a, bargain, customers, market, retail, shops, toys ner, ball, bench, defending, footballer, goal, won summer, #sunshine, bbq, hot, seaside, swimming cartoon, freedom, religion, shootings, terrorism ) respectively. Thus, a Twitter user in our data mensional feature vector. ng [12] on D2 to derive 200 (hard) clusters of of latent topics and linguistic expressions (e.g. ng’), a snapshot of which is presented in Ta- wn that this amount of clusters is adequate for n similar tasks [7,13,14]. We then computed the weets of D1 as described in feature category c5. ch user account, we took advantage of the SOC SOC, jobs are categorised based on the required the top level, there exist 9 general occupation down to sub-categories forming a 4-level struc- hy contains more specific job groupings (369 in plified mapping from these job groupings to a We used this mapping to assign an upper, mid- ccount in our data set. This process resulted in per, middle and lower SES classes, respectively.2 ds Process (GP), described below, as our main ation. GPs can be defined as sets of random which have a multivariate Gaussian distribution to learn a function f : Rd ! R drawn from a d : ⇠ GP(m(x), k(x, x0 )) , (1) n (here set equal to 0) and k(·, ·) is the covari- red exponential (SE) kernel, also known as the ://dx.doi.org/10.6084/m9.figshare.1619703. Table 1. 1-gram samples from a subset of the 200 latent topics (word clusters) ex- tracted automatically from Twitter data (D2). Topic Sample of 1-grams Corporate #business, clients, development, marketing, o ces, product Education assignments, coursework, dissertation, essay, library, notes, studies Family #family, auntie, dad, family, mother, nephew, sister, uncle Internet Slang ahahaha, awwww, hahaa, hahahaha, hmmmm, loooool, oooo, yay Politics #labour, #politics, #tories, conservatives, democracy, voters Shopping #shopping, asda, bargain, customers, market, retail, shops, toys Sports #football, #winner, ball, bench, defending, footballer, goal, won Summertime #beach, #sea, #summer, #sunshine, bbq, hot, seaside, swimming Terrorism #jesuischarlie, cartoon, freedom, religion, shootings, terrorism plus 2-grams) and 560 (1-grams) respectively. Thus, a Twitter user in our data set is represented by a 1, 291-dimensional feature vector. We applied spectral clustering [12] on D2 to derive 200 (hard) clusters of 1-grams that capture a number of latent topics and linguistic expressions (e.g. ‘Politics’, ‘Sports’, ‘Internet Slang’), a snapshot of which is presented in Ta- ble 1. Previous research has shown that this amount of clusters is adequate for achieving a strong performance in similar tasks [7,13,14]. We then computed the frequency of each topic in the tweets of D1 as described in feature category c5. To obtain a SES label for each user account, we took advantage of the SOC hierarchy’s characteristics [5]. In SOC, jobs are categorised based on the required skill level and specialisation. At the top level, there exist 9 general occupation groups, and the scheme breaks down to sub-categories forming a 4-level struc- ture. The bottom of this hierarchy contains more specific job groupings (369 in total). SOC also provides a simplified mapping from these job groupings to a SES as defined by NS-SEC [17]. We used this mapping to assign an upper, mid- dle or lower SES to each user account in our data set. This process resulted in 710, 318 and 314 users in the upper, middle and lower SES classes, respectively.2 3 Classification Methods We use a composite Gaussian Process (GP), described below, as our main method for performing classification. GPs can be defined as sets of random variables, any finite number of which have a multivariate Gaussian distribution [16]. Formally, GP methods aim to learn a function f : Rd ! R drawn from a GP prior given the inputs x 2 Rd : f(x) ⇠ GP(m(x), k(x, x0 )) , (1) where m(·) is the mean function (here set equal to 0) and k(·, ·) is the covari- ance kernel. We apply the squared exponential (SE) kernel, also known as the 2 The data set is available at http://dx.doi.org/10.6084/m9.figshare.1619703. Definition: Kernel formulation: ation mean performance as estimated via a 10-fold cross valida- GP classifier for both problem specifications. Parentheses hold timate. Accuracy Precision Recall F-score 5.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049) 2.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025) (RBF), defined as kSE(x, x0 ) = ✓2 exp kx x0 k2 2/(2`2 ) , nt that describes the overall level of variance and ` is re- acteristic length-scale parameter. Note that ` is inversely redictive relevancy of x (high values indicate a low degree classification using GPs ‘squashes’ the real valued latent through a logistic function: ⇡(x) , P(y = 1|x) = (f(x)) ogistic regression classification. In binary classification, the latent f⇤ is combined with the logistic function to produceR (f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a od and thus, the model parameters can only be estimated. use the Laplace approximation [16,18]. Table 2. SES classification mean performance as estimated via a 10-fold cross valida- tion of the composite GP classifier for both problem specifications. Parentheses hold the SD of the mean estimate. Num. of classes Accuracy Precision Recall F-score 3 75.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049) 2 82.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025) radial basis function (RBF), defined as kSE(x, x0 ) = ✓2 exp kx x0 k2 2/(2`2 ) , where ✓2 is a constant that describes the overall level of variance and ` is re- ferred to as the characteristic length-scale parameter. Note that ` is inversely proportional to the predictive relevancy of x (high values indicate a low degree of relevance). Binary classification using GPs ‘squashes’ the real valued latent function f(x) output through a logistic function: ⇡(x) , P(y = 1|x) = (f(x)) in a similar way to logistic regression classification. In binary classification, the distribution over the latent f⇤ is combined with the logistic function to produce the prediction ¯⇡⇤ = R (f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a non-Gaussian likelihood and thus, the model parameters can only be estimated. For this purpose we use the Laplace approximation [16,18]. Based on the property that the sum of covariance functions is also a valid covariance function [16], we model the di↵erent user feature categories with a di↵erent SE kernel. The final covariance function, therefore, becomes k(x, x0 ) = CX n=1 kSE(cn, c0 n) ! + kN(x, x0 ) , (2) where cn is used to express the features of each category, i.e., x = {c1, . . . , cC,}, C is equal to the number of feature categories (in our experimental setup, C = 5) and kN(x, x0 ) = ✓2 N ⇥ (x, x0 ) models noise ( being a Kronecker delta func- tion). Similar GP kernel formulations have been applied for text regression tasks [7,9,11] as a way of capturing groupings of the feature space more e↵ectively. Although related work has indicated the superiority of nonlinear approaches in similar multimodal tasks [7,14], we also estimate a performance baseline us- ing a linear method. Given the high dimensionality of our task, we apply logistic regression with elastic net regularisation [6] for this purpose. As both classifica- tion techniques can address binary tasks, we adopt the one–vs.–all strategy for conducting an inference. Table 2. SES classification mean performance as estimated via a 10-fold cross valida- tion of the composite GP classifier for both problem specifications. Parentheses hold the SD of the mean estimate. Num. of classes Accuracy Precision Recall F-score 3 75.09% (3.28%) 72.04% (4.40%) 70.76% (5.65%) .714 (.049) 2 82.05% (2.41%) 82.20% (2.39%) 81.97% (2.55%) .821 (.025) radial basis function (RBF), defined as kSE(x, x0 ) = ✓2 exp kx x0 k2 2/(2`2 ) , where ✓2 is a constant that describes the overall level of variance and ` is re- ferred to as the characteristic length-scale parameter. Note that ` is inversely proportional to the predictive relevancy of x (high values indicate a low degree of relevance). Binary classification using GPs ‘squashes’ the real valued latent function f(x) output through a logistic function: ⇡(x) , P(y = 1|x) = (f(x)) in a similar way to logistic regression classification. In binary classification, the distribution over the latent f⇤ is combined with the logistic function to produce the prediction ¯⇡⇤ = R (f⇤)P(f⇤|x, y, x⇤)df⇤. The posterior formulation has a non-Gaussian likelihood and thus, the model parameters can only be estimated. For this purpose we use the Laplace approximation [16,18]. Based on the property that the sum of covariance functions is also a valid covariance function [16], we model the di↵erent user feature categories with a di↵erent SE kernel. The final covariance function, therefore, becomes k(x, x0 ) = CX n=1 kSE(cn, c0 n) ! + kN(x, x0 ) , (2) where cn is used to express the features of each category, i.e., x = {c1, . . . , cC,}, C is equal to the number of feature categories (in our experimental setup, C = 5) and kN(x, x0 ) = ✓2 N ⇥ (x, x0 ) models noise ( being a Kronecker delta func- tion). Similar GP kernel formulations have been applied for text regression tasks [7,9,11] as a way of capturing groupings of the feature space more e↵ectively. Although related work has indicated the superiority of nonlinear approaches in similar multimodal tasks [7,14], we also estimate a performance baseline us- ing a linear method. Given the high dimensionality of our task, we apply logistic regression with elastic net regularisation [6] for this purpose. As both classifica- tion techniques can address binary tasks, we adopt the one–vs.–all strategy for conducting an inference. ce as estimated via a 10-fold cross valida- problem specifications. Parentheses hold ecision Recall F-score % (4.40%) 70.76% (5.65%) .714 (.049) % (2.39%) 81.97% (2.55%) .821 (.025) SE(x, x0 ) = ✓2 exp kx x0 k2 2/(2`2 ) , e overall level of variance and ` is re- le parameter. Note that ` is inversely of x (high values indicate a low degree GPs ‘squashes’ the real valued latent unction: ⇡(x) , P(y = 1|x) = (f(x)) ssification. In binary classification, the d with the logistic function to produce )df⇤. The posterior formulation has a odel parameters can only be estimated. roximation [16,18]. of covariance functions is also a valid di↵erent user feature categories with a function, therefore, becomes n, c0 n) ! + kN(x, x0 ) , (2) each category, i.e., x = {c1, . . . , cC}, C ies (in our experimental setup, C = 5) oise ( being a Kronecker delta func- e been applied for text regression tasks of the feature space more e↵ectively. the superiority of nonlinear approaches so estimate a performance baseline us- nsionality of our task, we apply logistic [6] for this purpose. As both classifica- , we adopt the one–vs.–all strategy for erformance as estimated via a 10-fold cross valida- for both problem specifications. Parentheses hold Precision Recall F-score ) 72.04% (4.40%) 70.76% (5.65%) .714 (.049) ) 82.20% (2.39%) 81.97% (2.55%) .821 (.025) ned as kSE(x, x0 ) = ✓2 exp kx x0 k2 2/(2`2 ) , ribes the overall level of variance and ` is re- ngth-scale parameter. Note that ` is inversely evancy of x (high values indicate a low degree n using GPs ‘squashes’ the real valued latent ogistic function: ⇡(x) , P(y = 1|x) = (f(x)) sion classification. In binary classification, the combined with the logistic function to produce |x, y, x⇤)df⇤. The posterior formulation has a , the model parameters can only be estimated. ace approximation [16,18]. he sum of covariance functions is also a valid el the di↵erent user feature categories with a ariance function, therefore, becomes X 1 kSE(cn, c0 n) ! + kN(x, x0 ) , (2) tures of each category, i.e., x = {c1, . . . , cC}, C categories (in our experimental setup, C = 5) odels noise ( being a Kronecker delta func- ons have been applied for text regression tasks upings of the feature space more e↵ectively. dicated the superiority of nonlinear approaches ], we also estimate a performance baseline us- gh dimensionality of our task, we apply logistic isation [6] for this purpose. As both classifica- y tasks, we adopt the one–vs.–all strategy for where Formulating a Gaussian Process classifier E Topics (word clusters) are formed by applying 
 spectral clustering on daily word frequencies in T2. Examples of topics with word samples Corporate: #business, clients, development, marketing, offices Education: assignments, coursework, dissertation, essay, library Internet Slang: ahahaha, awwww, hahaa, hahahaha, hmmmm Politics: #labour, #politics, #tories, conservatives, democracy Shopping: #shopping, asda, bargain, customers, market, retail Sports: #football, #winner, ball, bench, defending, footballer D Classification Accuracy (%) Precision (%) Recall (%) F1 2-way 82.05 (2.4) 82.2 (2.4) 81.97 (2.6) .821 (.03) 3-way 75.09 (3.3) 72.04 (4.4) 70.76 (5.7) .714 (.05) Classification performance (10-fold CV) T1 T2 P O1 584 115 83.5% O2 126 517 80.4% R 82.3% 81.8% 82.0% T1 T2 T3 P O1 606 84 53 81.6% O2 49 186 45 66.4% O3 55 48 216 67.7% R 854% 58.5% 68.8% 75.1% Confusion matrices (aggregate) O = output (inferred), T = target, P = precision, R = recall {1, 2, 3} = {upper, middle, lower} socioeconomic status F Conclusions. (a) First approach for inferring the socioeconomic status of a social media user, (b) 75% & 82% accuracy for the 3- way and binary classification tasks respectively, and (c) future work is required to evaluate this framework more rigorously and to analyse underlying qualitative properties in detail. Inferring the Socioeconomic Status of Social Media Users based on Behaviour & Language Vasileios Lampos, Nikolaos Aletras, Jens K. Geyti, Bin Zou & Ingemar J. Cox