Cultural and Geolocation
Aspects of Communication
in Twitter
Elena&Daehnhardt1&
Yanguo&Jing2&
Nick&Taylor1&
&
1.&Heriot;Wa=&University&&&&&&&&&&&&2.&London&Metropolitan&University&
About the Project
The$main$goals:$to$find$out$how$cultural$differences$
impact$user$behaviour$on$Twi9er$and$exploit$these$
differences$in$adapta;on:$
&
•  Cultural&Behaviour&Pa=erns&in&Microblogs&
•  Communica;on$in$Twi9er$(this$paper)$
•  ExploiIng&Cultural&Differences&in&Recommenders&&
(further&work)&
2
Outline
•  Research&QuesIons&
•  Approach&
•  Experimental&Setup&
•  Findings&
•  Further&Work&
3
The Main Idea
4
•  To$understand$if$communica;on$on$Twi9er$is$
influenced$by$user$origins.$
•  Why?&PotenIally,&we&could&use&this&knowledge&
for&friends/content&recommendaIons&(TODO).&
•  What&we&need?&Country&locaIons&are&required.&
•  In&the&previous&paper,&cultural&microblogging&
pa=erns&enabled&to&predict&user&countries & &&
;&challenge&for&a&larger&countries&set.&
Research Questions
5
•  How&we&can&exploit&Twi=er&to&infer&user&
countries?&
•  Which&features&are&important&for&inferring&
countries?&
•  Which&friend&features&are&important&for&
predicIng&follower&responses?&
“Follow”$
(tweets)$
PredicIon&
Model&
Follow&Selected&
Twi=er&Users&
Store&Tweets&of&
Users&and&Their&
Followers&
Create&User&
Profiles&of&the&
Selected&Users&
User$
Profiles$
Create&Country&
PredicIon&
Model&
Predict&Follower&
LocaIons&and&Create&
their&Profiles&
FOLLOWERS$
Approach: Data Collection & Country Prediction
“Follow”$
(tweets)$
User$
Profiles$
Create&
CommunicaIon&
Dataset&
“CommunicaIon”&
Users&&&Follower&Responses&
FOLLOWERS$
Approach: Response Prediction & Evaluation
Significance&of&User&
and&Follower&
Features&
Create&and&Evaluate&
PredicIon&Models&of&
Follower&Responses&
Geographic Locations
8
LinearNAc;ve$ US,&UK,&Canada&
ReacIve& Indonesia,&Japan,&Malaysia&
MulI;AcIve& Brazil,&Italy,&France,&Russia,&Spain,&
Turkey,&Mexico&
User Profiles
Feature$Sets$ Descrip;on$
LANGUAGE&& User&language&in&the&Twi=er&profile&
BEHAVIOR& Time&of&posIng,&user&influence,&hashtags&usage,&
…&
META& LANGUAGE&+&most&used&Time&zone&and&
LocaIon&
LOCATION& LocaIon&field&found&in&the&user&meta;data.&&
CONTENT& Of&a&one&tweet&
FOLLOWERS$ Most;frequent&country,&Ime&zone&and&average&
followers’&influence,&…&
11
UserNrelated$
12
H1.1.&The&informaIon&on&country&locaIon&
derived&from&user&tweets’&meta;data&and&
respecIve&meta;data&of&the&followers&is&not&
sufficient&for&predicIng&countries&✔&
&
Languages in Twitter Profiles
13
Number&of&Users&
%&
H2.1& A&users&contact&network&can&assist&in&
improving&country&predicIon.&✔&
H2.2& User&BEHAVIOR&pa=erns&can&assist&in&further&&
improving&locality&predicIon.&✗&
Feature$Set$ CV$Acc.$ Accuracy$ Precision$ Recall$ F1$
User;related&Data&
α=1&& α&& α=1&& α&& α=1&& α&& α=1&& α&& α=1&& α&&
LANGUAGE&
(LANG.)&
.88& .76& .88& .76& .78& .70& .88& .76& .82& .71&
LOCATION& .62& .58& .64& .59& .63& .66& .64& .59& .53& .51&
META& .91& .85& .91& .85& .90& .86& .91& .85& .90& .83&
BEHAVIOR+LANG.$ .80$ .66$ .81$ .66$ .81$ .67$ .81$ .66$ .81$ .67$
CONTENT& .63& .58& .65& .58& .55& .45& .65& .58& .54& .47&
BEHAVIOR&(BEH)& .44& .38& .46& .38& .48& .39& .46& .38& .47& .39&
FOLLOWERS&(FOL)& .88& .84& .88& .84& .88& .84& .88& .84& .88& .84&
Mixed&
Data&
LANG.+FOL.$ .94$ .87$ .94$ .87$ .94$ .87$ .94$ .87$ .94$ .87$
BEH.+FOL.$ .87$ .82$ .88$ .83$ .88$ .83$ .88$ .83$ .88$ .83$
BEH.+FOL.+LANG.$ .92$ .87$ .91$ .87$ .91$ .87$ .91$ .87$ .91$ .87$
Country
Prediction
Features Importance
Feature$ Importance$ Feature$ Importance$
Language$ 100$ FCountry$ 16.16$
FTimezone$ 15.36$ FInfluence$ 2.34$
Weekends$ 2.34$ Influence$ 2.13$
MenIons& 1.55& Tagging& 0.98&
Response& 0.73& FTimezones& 0.51&
Timezones& 0.36& Languages& 0.33&
FLanguage& 0.32& FLanguages& 0.26&
FCountries& 0.15& Mobility& 0.02&
16
(Country Predictions)
Countries of Repliers
17
Predicting Follower Replies
18
Which$friend$features$are$important$for$predic;ng$a$
user’s$follower$responses?$$
$
User&and&follower’s&country&locaIons’&and&language&
match&are&amongst&the&most&important&predicIon&
parameters&for&user&responses.&✗&
&
User’s&influence&is&significant&in&predicIng&her&follower&
responses.&✗&
&
Predicting Follower Replies
19
Not Significant Significant
Table 4: Relative Features Importance (RFI) in Followers’ Response Prediction Test using Decision Trees,
Logistic Regression Analysis Results (Predicted Logit of Interest) and Logit Marginal E↵ects,
statistically significant (with p < 0.05) are shown in bold font
Parameter D.
Tree
Logistic Regression Results, pseudo R2
⇡
0.23, sensitivity ⇡ 62, specificity ⇡ 72
Marginal E↵ects
RFI Odds
Ra-
tio
z P >
|z|
Std.
Err.
95%
Conf.Int.
dy/dx Std.
Err.
z P >
|z|
95%
Conf.Int.
Intercept 172.7 2.61 0.01 5.15 1.97 1.29 9.02
CountryMatch* 2.22 0.46 -1.48 0.14 -0.78 0.53 -1.82 0.25 -0.14 0.09 -1.49 0.14 -0.33 0.04
DimMatch* 3.91 4.45 2.31 0.02 1.49 0.64 0.23 2.76 0.23 0.11 2.35 0.02 0.04 0.49
FCMatch* 9.63 0.68 -1.96 0.05 -0.38 0.19 -0.76 -0.00 -0.07 0.03 -1.98 0.05 -0.14 -0.00
FLangMatch* 100 0.09 -10.03 0.00 -2.41 0.24 -2.88 -1.94 -0.43 0.03 -14.23 0.00 -0.49 -0.37
FTimezMatch* 7.20 1.20 0.86 0.39 0.18 0.21 -0.23 0.59 0.03 0.04 0.86 0.39 -0.04 0.11
LangMatch* 6.50 0.79 -0.67 0.50 -0.24 0.35 -0.93 0.46 -0.04 0.06 -0.67 0.50 -0.17 0.08
FCountries 24.79 1.16 1.44 0.15 0.15 0.11 -0.05 0.36 0.03 0.02 1.45 0.15 -0.01 0.06
FInfluence 74.49 0.26 -2.15 0.03 -1.36 0.63 -2.59 -0.12 -0.23 0.11 -2.17 0.03 -0.46 -0.02
FLanguages 56.84 0.85 -3.05 0.00 -0.16 0.05 -0.26 -0.06 -0.03 0.01 -3.12 0.00 -0.05 -0.01
FTimezones 25.78 0.98 -0.76 0.44 -0.02 0.03 -0.07 0.03 -0.00 0.01 -0.77 0.00 -0.01 0.01
Influence 70.61 0.94 -0.08 0.93 -0.06 0.73 -1.49 1.37 -0.01 0.13 -0.08 0.93 -0.27 0.25
Languages 23.37 0.78 -2.88 0.00 -0.24 0.08 -0.41 -0.08 -0.04 0.01 -2.94 0.00 -0.07 -0.01
Mentions 2.08 0.64 -0.62 0.53 -0.45 0.72 -1.86 0.96 -0.08 0.13 -0.62 0.53 -0.34 0.17
Mobility 2.96 0.14 -1.78 0.07 -1.97 1.10 -4.13 0.19 -0.35 0.20 -1.8 0.07 -0.74 0.03
Response 28.60 1.08 0.05 0.96 0.08 1.59 -3.04 3.20 0.01 0.29 0.05 0.96 -0.55 0.58
Tagging 27.93 1.11 0.31 0.75 0.10 0.32 -0.53 0.73 0.2 0.06 0.31 0.75 -0.09 0.13
Timezones 10.82 0.69 -1.65 0.10 -0.37 0.22 -0.81 0.07 -0.07 0.04 -1.66 0.09 -0.14 0.01
Weekends 77.61 2.04 1.37 0.17 0.71 0.52 -0.30 1.73 0.13 0.09 1.38 0.17 -0.05 0.31
with the highest rank of “interestingness”. Out
of 106775 ranks, only 343 ranks were of 0 value
(no interest). This is why for creating our model
tics based on decision trees presented CountryMatch,
DimMatch and LangMatch within the five least im-
portant features set. This is why we could not ac-
//&
20
Not Significant Significant
statistically significant (with p < 0.05) are shown in bold font
Parameter D.
Tree
Logistic Regression Results, pseudo R2
⇡
0.23, sensitivity ⇡ 62, specificity ⇡ 72
RFI Odds
Ra-
tio
z P >
|z|
Std.
Err.
95%
Conf.Int.
dy/dx S
E
Intercept 172.7 2.61 0.01 5.15 1.97 1.29 9.02
CountryMatch* 2.22 0.46 -1.48 0.14 -0.78 0.53 -1.82 0.25 -0.14 0
DimMatch* 3.91 4.45 2.31 0.02 1.49 0.64 0.23 2.76 0.23 0
FCMatch* 9.63 0.68 -1.96 0.05 -0.38 0.19 -0.76 -0.00 -0.07 0
FLangMatch* 100 0.09 -10.03 0.00 -2.41 0.24 -2.88 -1.94 -0.43 0
FTimezMatch* 7.20 1.20 0.86 0.39 0.18 0.21 -0.23 0.59 0.03 0
LangMatch* 6.50 0.79 -0.67 0.50 -0.24 0.35 -0.93 0.46 -0.04 0
FCountries 24.79 1.16 1.44 0.15 0.15 0.11 -0.05 0.36 0.03 0
FInfluence 74.49 0.26 -2.15 0.03 -1.36 0.63 -2.59 -0.12 -0.23 0
FLanguages 56.84 0.85 -3.05 0.00 -0.16 0.05 -0.26 -0.06 -0.03 0
FTimezones 25.78 0.98 -0.76 0.44 -0.02 0.03 -0.07 0.03 -0.00 0
Influence 70.61 0.94 -0.08 0.93 -0.06 0.73 -1.49 1.37 -0.01 0
Languages 23.37 0.78 -2.88 0.00 -0.24 0.08 -0.41 -0.08 -0.04 0
Mentions 2.08 0.64 -0.62 0.53 -0.45 0.72 -1.86 0.96 -0.08 0
Mobility 2.96 0.14 -1.78 0.07 -1.97 1.10 -4.13 0.19 -0.35 0
Response 28.60 1.08 0.05 0.96 0.08 1.59 -3.04 3.20 0.01 0
//&
21
Not Significant Significant
p < 0.05) are shown in bold font
stic Regression Results, pseudo R2
⇡
, sensitivity ⇡ 62, specificity ⇡ 72
Marginal E↵ects
s z P >
|z|
Std.
Err.
95%
Conf.Int.
dy/dx Std.
Err.
z P >
|z|
95%
Conf.Int.
7 2.61 0.01 5.15 1.97 1.29 9.02
-1.48 0.14 -0.78 0.53 -1.82 0.25 -0.14 0.09 -1.49 0.14 -0.33 0.04
2.31 0.02 1.49 0.64 0.23 2.76 0.23 0.11 2.35 0.02 0.04 0.49
-1.96 0.05 -0.38 0.19 -0.76 -0.00 -0.07 0.03 -1.98 0.05 -0.14 -0.00
-10.03 0.00 -2.41 0.24 -2.88 -1.94 -0.43 0.03 -14.23 0.00 -0.49 -0.37
0.86 0.39 0.18 0.21 -0.23 0.59 0.03 0.04 0.86 0.39 -0.04 0.11
-0.67 0.50 -0.24 0.35 -0.93 0.46 -0.04 0.06 -0.67 0.50 -0.17 0.08
1.44 0.15 0.15 0.11 -0.05 0.36 0.03 0.02 1.45 0.15 -0.01 0.06
-2.15 0.03 -1.36 0.63 -2.59 -0.12 -0.23 0.11 -2.17 0.03 -0.46 -0.02
-3.05 0.00 -0.16 0.05 -0.26 -0.06 -0.03 0.01 -3.12 0.00 -0.05 -0.01
-0.76 0.44 -0.02 0.03 -0.07 0.03 -0.00 0.01 -0.77 0.00 -0.01 0.01
-0.08 0.93 -0.06 0.73 -1.49 1.37 -0.01 0.13 -0.08 0.93 -0.27 0.25
-2.88 0.00 -0.24 0.08 -0.41 -0.08 -0.04 0.01 -2.94 0.00 -0.07 -0.01
-0.62 0.53 -0.45 0.72 -1.86 0.96 -0.08 0.13 -0.62 0.53 -0.34 0.17
-1.78 0.07 -1.97 1.10 -4.13 0.19 -0.35 0.20 -1.8 0.07 -0.74 0.03
0.05 0.96 0.08 1.59 -3.04 3.20 0.01 0.29 0.05 0.96 -0.55 0.58
statistically significant (with p < 0.05) are shown in bold font
Parameter D.
Tree
Logistic Regression Results, pseudo R2
⇡
0.23, sensitivity ⇡ 62, specificity ⇡ 72
RFI Odds
Ra-
tio
z P >
|z|
Std.
Err.
95%
Conf.Int.
dy/dx Std
Err
Intercept 172.7 2.61 0.01 5.15 1.97 1.29 9.02
CountryMatch* 2.22 0.46 -1.48 0.14 -0.78 0.53 -1.82 0.25 -0.14 0.0
DimMatch* 3.91 4.45 2.31 0.02 1.49 0.64 0.23 2.76 0.23 0.1
FCMatch* 9.63 0.68 -1.96 0.05 -0.38 0.19 -0.76 -0.00 -0.07 0.0
FLangMatch* 100 0.09 -10.03 0.00 -2.41 0.24 -2.88 -1.94 -0.43 0.0
FTimezMatch* 7.20 1.20 0.86 0.39 0.18 0.21 -0.23 0.59 0.03 0.0
LangMatch* 6.50 0.79 -0.67 0.50 -0.24 0.35 -0.93 0.46 -0.04 0.0
FCountries 24.79 1.16 1.44 0.15 0.15 0.11 -0.05 0.36 0.03 0.0
FInfluence 74.49 0.26 -2.15 0.03 -1.36 0.63 -2.59 -0.12 -0.23 0.1
FLanguages 56.84 0.85 -3.05 0.00 -0.16 0.05 -0.26 -0.06 -0.03 0.0
FTimezones 25.78 0.98 -0.76 0.44 -0.02 0.03 -0.07 0.03 -0.00 0.0
Influence 70.61 0.94 -0.08 0.93 -0.06 0.73 -1.49 1.37 -0.01 0.1
Languages 23.37 0.78 -2.88 0.00 -0.24 0.08 -0.41 -0.08 -0.04 0.0
Mentions 2.08 0.64 -0.62 0.53 -0.45 0.72 -1.86 0.96 -0.08 0.1
Mobility 2.96 0.14 -1.78 0.07 -1.97 1.10 -4.13 0.19 -0.35 0.2
Response 28.60 1.08 0.05 0.96 0.08 1.59 -3.04 3.20 0.01 0.2
//&
Findings
User$Country$Predic;on:$
$
•  InformaIon&publicly&available&in&user&meta;data&and&
followers’&network&enabled&us&to&predict&user&country&
locaIons&with&an&accuracy&of&>&90%&
•  The&most&successful&feature&combinaIons&included&
Followers;related&data&such&as&their&Country,&Timezone&
and&Influence,&and&User;related&data&such&as&Language&
defined&in&User&Profile,&Influence&and&TweeIng&on&
Weekends.&&
& 23
Findings
Predic;ng$User$Replies$
$
•  Dimension&Match&was&more&important&than&Country&
Match,&and&also&staIsIcally&significant,&leading&to&
improved&probability&of&user&replies.&
•  Users&with&more&influenIal&followers&might&get&less&
replies.&&
•  When&users&have&followers,&with&matching&majority&
language,&their&reply&probability&has&dropped&by&43%.&
&
24
Further Work
•  Analyze&microblogging&acIviIes&in&long;term.&
•  Evaluate&recommendaIon&strategies&considering&
cultural&origins&of&users.&
•  InvesIgate&in&depth&more&content;related&
features&including&contents&of&micro&posts&and&
hashtags&
25
Related Paper
E.&Ilina&(Daehnhardt).&A&User&Modeling&Oriented&
Analysis&of&Cultural&Backgrounds&in&Microblogging.&
HUMAN&JOURNAL,&1(4):166–181,&2012.&
Your Questions?
&
&
elena@daehnhardt.com&
& 26

Cultural and Aspects of Communication Geolocation in Twitter