SlideShare a Scribd company logo
Design and Multilingual Users
on Twitter and Wikipedia
Scott A. Hale
scott.hale@oii.ox.ac.uk
http://www.scotthale.net/
Oxford Internet Institute
University of Oxford
17 June 2014
Scott A. Hale Design and Multilingual Users
Importance of design
Scott A. Hale Design and Multilingual Users
Importance of design
Scott A. Hale Design and Multilingual Users
Content is diverse across languages
“multilingualism...[is] the norm for most of the world’s societies” (Birner,
2005), with over half of Europe and over a fifth of the US multilingual
(Erard, 2012); yet, many platforms are designed only with monolingual users
in mind.
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale Design and Multilingual Users
Content is diverse across languages
“multilingualism...[is] the norm for most of the world’s societies” (Birner,
2005), with over half of Europe and over a fifth of the US multilingual
(Erard, 2012); yet, many platforms are designed only with monolingual users
in mind.
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Users often contribute local content/knowledge (Hecht & Gergle,
2010a)
Large diversity in information between languages (Hecht & Gergle,
2010b)
Can lead to self-focus bias (Hecht & Gergle, 2009)
Scott A. Hale Design and Multilingual Users
Motivations
Language clustering vs. small-worlds
Users thought to cluster by language in most online platforms (Barnett
& Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng
& Varis, 1974; Takhteyev, Gruzd, & Wellman, 2011; Wilkinson &
Thelwall, 2012)
Many online platforms thought to exhibit the ‘small-world’ phenomenon
of small path lengths between users (despite high clustering)
Scott A. Hale Design and Multilingual Users
Motivations
Language clustering vs. small-worlds
Users thought to cluster by language in most online platforms (Barnett
& Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng
& Varis, 1974; Takhteyev et al., 2011; Wilkinson & Thelwall, 2012)
Many online platforms thought to exhibit the ‘small-world’ phenomenon
of small path lengths between users (despite high clustering)
Role of multilingual users
⇒ If users cluster by language and platforms are small-worlds, there must
be brokers bridging different language groups (spanning structural
holes)
Multilingual users are possible bridge users. Only one study
investigating this: Ego-net level study on Twitter following–follower
network structure (Eleta & Golbeck, 2012).
No study multiplatform study, no study at large-scale level
Scott A. Hale Design and Multilingual Users
Outline
What are the roles of multilinguals and platform design in shaping the
spread of information in social media?
Twitter and Wikipedia at a global level
1 Language will have strong role in structuring the platform
2 Users engaging with content in multiple languages (multilingual users)
serve as bridges between different clusters/editions
3 Users primarily writing in less-represented languages will be more likely
to cross-language boundaries than users writing in highly-represented
languages
4 When users cross languages they will cross to larger languages (e.g.
English) and thus at a language level English will form more bridges
than other other languages
Scott A. Hale Design and Multilingual Users
Data
Twitter
Twitter mentions, retweet
network
18 days of ‘spritzer’ 1% sample
stream from June 2011
7,341,271 nodes. 8,545,693
directed, weighted edges
Wikipedia
Edits from top 46 language
editions
8 July to 9 August 2013
3.5 million non-minor edits by
55,568 registered users
Global Connectivity and Multilinguals in the Twitter Network (2014).
http://www.scotthale.net/pubs/?chi2014
Multilinguals and Wikipedia Editing (2014).
http://www.scotthale.net/pubs/?websci2014
Scott A. Hale Design and Multilingual Users
Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham,
Hale, & Gaffney, 2013)
Scott A. Hale Design and Multilingual Users
Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Scott A. Hale Design and Multilingual Users
Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Bots and spam users
Remove users with no mentions
(indegree=0)
Select only the largest
weakly-connected component
(88% of nodes)
Scott A. Hale Design and Multilingual Users
Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Bots and spam users
Remove users with no mentions
(indegree=0)
Select only the largest
weakly-connected component
(88% of nodes)
End result
916,836 nodes (users) and 2,652,618
directed edges (mentions/retweets)
Each user assigned most used
language and frequency [0-1] that
the most used language is used
Scott A. Hale Design and Multilingual Users
Wikipedia: Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users (excluding Simple English edition) with a total of 3,518,955
edits.
Scott A. Hale Design and Multilingual Users
User counts
Twitter
Language User Count
English (en) 375,474
Japanese (ja) 137,263
Portuguese (pt) 133,501
Malay/Indonesian (ms) 106,223
Spanish (es) 70,246
Dutch (nl) 31,035
Korean (ko) 16,123
Thai (th) 8,629
Arabic (ar) 7,679
French (fr) 5,769
Filipino/Tagalog (fil) 5,393
Wikipedia
Language User Count
English 22,412
German 4,920
French 3,430
Russian 3,330
Spanish 3,299
Japanese 3,164
Italian 2,202
Chinese 1,975
Portuguese 1,220
Polish 1,011
Dutch 1,007
Scott A. Hale Design and Multilingual Users
Twitter: Multilinguals vs Monolinguals
On Twitter, 11% of users (˜103,000) were observed to use more than one
language and designated as multilingual users.
Multilingual vs. monolingual users: Comparison of tweet count, out-degree, and
in-degree.
Scott A. Hale Design and Multilingual Users
Wikipedia: Multilinguals vs Monolinguals
On Wikipedia, 15.4% of users
(8,544) edited more than one
language edition and were
designated as multilingual users.
Density plot compares the
number of edits made by
monolingual and multilingual
Wikipedia users. Size of edits
does not differ significantly.
Scott A. Hale Design and Multilingual Users
Wikipedia: Multilinguals vs Monolinguals
On Wikipedia, 15.4% of users
(8,544) edited more than one
language edition and were
designated as multilingual users.
Density plot compares the
number of edits made by
monolingual and multilingual
Wikipedia users. Size of edits
does not differ significantly.
Only 2.6% of edits are from users
writing in their non-primary
languages on Wikipedia.
Scott A. Hale Design and Multilingual Users
Twitter: Language and structure
Label propagation algorithm (Raghavan, Albert, & Kumara, 2007) found
20,253 communities.
Histograms of the size of communities (left) and the number of languages within
each community (right). Modularity score of 0.81 for this community structure.
Scott A. Hale Design and Multilingual Users
Twitter: Language and structure
Scatter plot of community size and
the percentage of users in the
community most often using the most
prevalent language.
Scott A. Hale Design and Multilingual Users
Language and structure
Most-used
language
% users
in most-used
language
Number of
languages
Number of
nodes
Malay (ms) 78.3 41 123,616
English (en) 99.3 39 114,826
Portuguese (pt) 94.3 40 101,987
Japanese (ja) 99.6 19 83,785
English (en) 75.7 44 80,387
English (en) 55.1 42 37,688
Dutch (nl) 90.6 23 20,634
Table Clusters with over 10,000 nodes found through the label propagation
algorithm. Collectively 61% of all users are in one of these clusters.
Scott A. Hale Design and Multilingual Users
Twitter: Do multilinguals bridge clusters?
Size of the largest, weakly-connected component (left), total number of components
(center), and average size of the components (right) created by removing all
multilingual users, an equivalent number of monolingual users randomly, an
equivalent number of all users randomly, and removing all multilingual users from a
network with the same degree distribution but with edges randomly shuffled. Box
plots show values from 100 realizations. Mean values are indicated with +.
Scott A. Hale Design and Multilingual Users
Wikipedia: Do multilinguals bridge editions?
Do multilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primary
languages, but a large number of users also always edited the same articles in their
primary languages.
Scott A. Hale Design and Multilingual Users
Wikipedia: Do multilinguals bridge editions?
Do multilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primary
languages, but a large number of users also always edited the same articles in their
primary languages.
Scott A. Hale Design and Multilingual Users
Variations by language
Twitter Wikipedia
Number of users in each language compared to the percentage of these users
classified as multilingual.
Scott A. Hale Design and Multilingual Users
Twitter: Cross-language connections
ar
de
en
es
fil
fr
gl
it
ja
koms
nl
pt
th
Mentions and retweets across
languages
Nodes represent most-used
language
Directed, weighted edges show
the log of the number of users
primarily using one language who
mention / retweet users in
another language
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
N.B. This differs from the published paper where edges were normalized by the expected number of connections between language
pairs if tweets were directed at users randomly without regard to language.
Scott A. Hale Design and Multilingual Users
Wikipedia: Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale Design and Multilingual Users
Wikipedia: Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale Design and Multilingual Users
Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
Important per language variations
Users in less-represented languages
more likely to cross-language
boundaries on Wikipedia, but no
correlation on Twitter.
Platform differences?
Consistent findings of English
and Japanese as outliers
Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
Important per language variations
Users in less-represented languages
more likely to cross-language
boundaries on Wikipedia, but no
correlation on Twitter.
Platform differences?
Consistent findings of English
and Japanese as outliers
Larger languages form bridges
Especially English, but
Other geolinguistic patterns
evident
Global connectivity results
through the combination of
multilinguals across many
language pairs
Design and Multilingual Users
on Twitter and Wikipedia
Scott A. Hale
scott.hale@oii.ox.ac.uk
http://www.scotthale.net/
Oxford Internet Institute
University of Oxford
17 June 2014
Scott A. Hale Design and Multilingual Users
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall
who provided helpful comments on various aspects of this research.
Barnett, G. A., & Choi, Y. (1995). Physical Distance and Language as
Determinants of the International Telecommunications Network.
International Political Science Review, 16(3), 249–265. Available from
http://ips.sagepub.com/content/16/3/249.abstract
Birner, B. (2005). Bilingualism (Tech. Rep.). Washington, DC, USA:
Linguistic Socieyt of America. Available from
http://www.linguisticsociety.org/files/Bilingual.pdf
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Erard, M. (2012, January). Are we Really Monolingual? Available from
http://www.nytimes.com/2012/01/15/opinion/sunday/
are-we-really-monolingual.html
Scott A. Hale Design and Multilingual Users
Graham, M., Hale, S. A., & Gaffney, D. (2013). Where in the world are
you? Geolocation and language identification in Twitter. Professional
Geographer.
Hale, S. A. (2012a). Impact of platform design on cross-language
information exchange. In Proceedings of the 2012 acm annual
conference on human factors in computing systems extended abstracts
(pp. 1363–1368). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/2212776.2212456
Hale, S. A. (2012b). Net Increase? Cross-Lingual Linking in the
Blogosphere. Journal of Computer-Mediated Communication, 17(2),
135–151. Available from http://onlinelibrary.wiley.com/doi/
10.1111/j.1083-6101.2011.01568.x/full
Hale, S. A. (2014a). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Scott A. Hale Design and Multilingual Users
Hale, S. A. (2014b). Multilinguals and Wikipedia Editing. In Proceedings of
the 6th annual acm web science conference. New York, NY, USA:
ACM. Available from http://arxiv.org/abs/1312.0976
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Hecht, B., & Gergle, D. (2010a). On the “localness” of user-generated
content. In Proceedings of the 2010 acm conference on computer
supported cooperative work (pp. 229–232). New York, NY, USA:
ACM. Available from
http://doi.acm.org/10.1145/1718918.1718962
Hecht, B., & Gergle, D. (2010b). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Scott A. Hale Design and Multilingual Users
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Nordenstreng, K., & Varis, T. (1974). Television traffic: A one-way street?
A survey and analysis of the international flow of television programme
material. Reports and Papers on Mass Communication(70).
Raghavan, U. N., Albert, R., & Kumara, S. (2007, September). Near linear
time algorithm to detect community structures in large-scale networks.
Phys. Rev. E, 76(3), 36106. Available from
http://link.aps.org/doi/10.1103/PhysRevE.76.036106
Takhteyev, Y., Gruzd, A., & Wellman, B. (2011). Geography of Twitter
networks. Social Networks, 1–26. Available from
http://www.sciencedirect.com/science/article/pii/
S0378873311000359#FCANote
Scott A. Hale Design and Multilingual Users
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Wilkinson, D., & Thelwall, M. (2012). Trending Twitter topics in English:
An international comparison. Journal of the American Society for
Information Science and Technology, 63(8), 1631–1646. Available
from http://dx.doi.org/10.1002/asi.22713
Zuckerman, E. (2008). Meet the bridgebloggers. Public Choice, 134(1),
47–65.
Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale Design and Multilingual Users

More Related Content

Similar to Design and Multilingual Users on Twitter and Wikipedia

Creating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global AudiencesCreating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global Audiences
Eddie Hollon
 
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
Laurel
 
Glis Localization Internationalization 05 20071030
Glis Localization Internationalization 05 20071030Glis Localization Internationalization 05 20071030
Glis Localization Internationalization 05 20071030
Jan Pawlowski
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET Journal
 
Using community feedback to improve social networking terminology in Microsof...
Using community feedback to improve social networking terminology in Microsof...Using community feedback to improve social networking terminology in Microsof...
Using community feedback to improve social networking terminology in Microsof...Palle Petersen, PMP
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
Antigoni-Maria Founta
 
DAISY Consortium Open Source Projects
DAISY Consortium Open Source ProjectsDAISY Consortium Open Source Projects
DAISY Consortium Open Source Projects
DAISY Consortium
 
A Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchA Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching Research
Jim Webb
 
FLOSSCom Workshop Greece
FLOSSCom Workshop GreeceFLOSSCom Workshop Greece
FLOSSCom Workshop Greece
Andreas Meiszner
 
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSAUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
IRJET Journal
 
Language, Twitter and Academic Conferences
Language,  Twitter  and Academic Conferences Language,  Twitter  and Academic Conferences
Language, Twitter and Academic Conferences
Ruth Garcia Gavilanes
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
Alain Désilets
 
Enrope language policy, linguistic housekeeping, definitions and implementation
Enrope language policy, linguistic housekeeping, definitions and implementationEnrope language policy, linguistic housekeeping, definitions and implementation
Enrope language policy, linguistic housekeeping, definitions and implementation
Priit Tammets
 
An Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdfAn Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdf
Jackie Gold
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
Cornelius Puschmann
 
Technology
TechnologyTechnology
Technology
Sudha Kumari
 
Technology
TechnologyTechnology
Technology
arberiii
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
eSAT Publishing House
 

Similar to Design and Multilingual Users on Twitter and Wikipedia (20)

Creating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global AudiencesCreating Technical Documents In English For Global Audiences
Creating Technical Documents In English For Global Audiences
 
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
5Cs and Web 2.0: Enhancing Foreign Language Teaching with Web 2.0 Technologies
 
Glis Localization Internationalization 05 20071030
Glis Localization Internationalization 05 20071030Glis Localization Internationalization 05 20071030
Glis Localization Internationalization 05 20071030
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
Greek Evaluation
Greek EvaluationGreek Evaluation
Greek Evaluation
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
 
Using community feedback to improve social networking terminology in Microsof...
Using community feedback to improve social networking terminology in Microsof...Using community feedback to improve social networking terminology in Microsof...
Using community feedback to improve social networking terminology in Microsof...
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
 
DAISY Consortium Open Source Projects
DAISY Consortium Open Source ProjectsDAISY Consortium Open Source Projects
DAISY Consortium Open Source Projects
 
A Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchA Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching Research
 
FLOSSCom Workshop Greece
FLOSSCom Workshop GreeceFLOSSCom Workshop Greece
FLOSSCom Workshop Greece
 
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSAUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
 
Language, Twitter and Academic Conferences
Language,  Twitter  and Academic Conferences Language,  Twitter  and Academic Conferences
Language, Twitter and Academic Conferences
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
 
Enrope language policy, linguistic housekeeping, definitions and implementation
Enrope language policy, linguistic housekeeping, definitions and implementationEnrope language policy, linguistic housekeeping, definitions and implementation
Enrope language policy, linguistic housekeeping, definitions and implementation
 
An Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdfAn Open Online Dictionary for Endangered Uralic Languages.pdf
An Open Online Dictionary for Endangered Uralic Languages.pdf
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Technology
TechnologyTechnology
Technology
 
Technology
TechnologyTechnology
Technology
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 

More from Scott A. Hale

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching Misinformation
Scott A. Hale
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...
Scott A. Hale
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
Scott A. Hale
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)
Scott A. Hale
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
Scott A. Hale
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and dissemination
Scott A. Hale
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
Scott A. Hale
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Scott A. Hale
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK Web
Scott A. Hale
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers Experiment
Scott A. Hale
 

More from Scott A. Hale (10)

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching Misinformation
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and dissemination
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK Web
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers Experiment
 

Recently uploaded

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Design and Multilingual Users on Twitter and Wikipedia

  • 1. Design and Multilingual Users on Twitter and Wikipedia Scott A. Hale scott.hale@oii.ox.ac.uk http://www.scotthale.net/ Oxford Internet Institute University of Oxford 17 June 2014 Scott A. Hale Design and Multilingual Users
  • 2. Importance of design Scott A. Hale Design and Multilingual Users
  • 3. Importance of design Scott A. Hale Design and Multilingual Users
  • 4. Content is diverse across languages “multilingualism...[is] the norm for most of the world’s societies” (Birner, 2005), with over half of Europe and over a fifth of the US multilingual (Erard, 2012); yet, many platforms are designed only with monolingual users in mind. In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Scott A. Hale Design and Multilingual Users
  • 5. Content is diverse across languages “multilingualism...[is] the norm for most of the world’s societies” (Birner, 2005), with over half of Europe and over a fifth of the US multilingual (Erard, 2012); yet, many platforms are designed only with monolingual users in mind. In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Users often contribute local content/knowledge (Hecht & Gergle, 2010a) Large diversity in information between languages (Hecht & Gergle, 2010b) Can lead to self-focus bias (Hecht & Gergle, 2009) Scott A. Hale Design and Multilingual Users
  • 6. Motivations Language clustering vs. small-worlds Users thought to cluster by language in most online platforms (Barnett & Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng & Varis, 1974; Takhteyev, Gruzd, & Wellman, 2011; Wilkinson & Thelwall, 2012) Many online platforms thought to exhibit the ‘small-world’ phenomenon of small path lengths between users (despite high clustering) Scott A. Hale Design and Multilingual Users
  • 7. Motivations Language clustering vs. small-worlds Users thought to cluster by language in most online platforms (Barnett & Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng & Varis, 1974; Takhteyev et al., 2011; Wilkinson & Thelwall, 2012) Many online platforms thought to exhibit the ‘small-world’ phenomenon of small path lengths between users (despite high clustering) Role of multilingual users ⇒ If users cluster by language and platforms are small-worlds, there must be brokers bridging different language groups (spanning structural holes) Multilingual users are possible bridge users. Only one study investigating this: Ego-net level study on Twitter following–follower network structure (Eleta & Golbeck, 2012). No study multiplatform study, no study at large-scale level Scott A. Hale Design and Multilingual Users
  • 8. Outline What are the roles of multilinguals and platform design in shaping the spread of information in social media? Twitter and Wikipedia at a global level 1 Language will have strong role in structuring the platform 2 Users engaging with content in multiple languages (multilingual users) serve as bridges between different clusters/editions 3 Users primarily writing in less-represented languages will be more likely to cross-language boundaries than users writing in highly-represented languages 4 When users cross languages they will cross to larger languages (e.g. English) and thus at a language level English will form more bridges than other other languages Scott A. Hale Design and Multilingual Users
  • 9. Data Twitter Twitter mentions, retweet network 18 days of ‘spritzer’ 1% sample stream from June 2011 7,341,271 nodes. 8,545,693 directed, weighted edges Wikipedia Edits from top 46 language editions 8 July to 9 August 2013 3.5 million non-minor edits by 55,568 registered users Global Connectivity and Multilinguals in the Twitter Network (2014). http://www.scotthale.net/pubs/?chi2014 Multilinguals and Wikipedia Editing (2014). http://www.scotthale.net/pubs/?websci2014 Scott A. Hale Design and Multilingual Users
  • 10. Twitter: Data cleaning Language classification Clean text of tweets for language detection (remove urls, usernames, emoticons) Use Chromium Compact Language Detection kit for language detection (Graham, Hale, & Gaffney, 2013) Scott A. Hale Design and Multilingual Users
  • 11. Twitter: Data cleaning Language classification Clean text of tweets for language detection (remove urls, usernames, emoticons) Use Chromium Compact Language Detection kit for language detection (Graham et al., 2013) Remove users with less than 2 tweets or 20% of the user’s tweets in one language Remove users with less than four tweets total Scott A. Hale Design and Multilingual Users
  • 12. Twitter: Data cleaning Language classification Clean text of tweets for language detection (remove urls, usernames, emoticons) Use Chromium Compact Language Detection kit for language detection (Graham et al., 2013) Remove users with less than 2 tweets or 20% of the user’s tweets in one language Remove users with less than four tweets total Bots and spam users Remove users with no mentions (indegree=0) Select only the largest weakly-connected component (88% of nodes) Scott A. Hale Design and Multilingual Users
  • 13. Twitter: Data cleaning Language classification Clean text of tweets for language detection (remove urls, usernames, emoticons) Use Chromium Compact Language Detection kit for language detection (Graham et al., 2013) Remove users with less than 2 tweets or 20% of the user’s tweets in one language Remove users with less than four tweets total Bots and spam users Remove users with no mentions (indegree=0) Select only the largest weakly-connected component (88% of nodes) End result 916,836 nodes (users) and 2,652,618 directed edges (mentions/retweets) Each user assigned most used language and frequency [0-1] that the most used language is used Scott A. Hale Design and Multilingual Users
  • 14. Wikipedia: Data cleaning Non-minor edits by registered, human users to articles Only edits to main (article) namespace Removed articles flagged as being created by ‘bots’ Removed anonymous users Removed undeclared bots and users with only one edit session in the month Require at least four edits and at least 2 edits to one edition Matching users and articles across languages Look for common usernames across language editions Check usernames are indeed linked global accounts WikiData dump to match articles across languages 55,568 users (excluding Simple English edition) with a total of 3,518,955 edits. Scott A. Hale Design and Multilingual Users
  • 15. User counts Twitter Language User Count English (en) 375,474 Japanese (ja) 137,263 Portuguese (pt) 133,501 Malay/Indonesian (ms) 106,223 Spanish (es) 70,246 Dutch (nl) 31,035 Korean (ko) 16,123 Thai (th) 8,629 Arabic (ar) 7,679 French (fr) 5,769 Filipino/Tagalog (fil) 5,393 Wikipedia Language User Count English 22,412 German 4,920 French 3,430 Russian 3,330 Spanish 3,299 Japanese 3,164 Italian 2,202 Chinese 1,975 Portuguese 1,220 Polish 1,011 Dutch 1,007 Scott A. Hale Design and Multilingual Users
  • 16. Twitter: Multilinguals vs Monolinguals On Twitter, 11% of users (˜103,000) were observed to use more than one language and designated as multilingual users. Multilingual vs. monolingual users: Comparison of tweet count, out-degree, and in-degree. Scott A. Hale Design and Multilingual Users
  • 17. Wikipedia: Multilinguals vs Monolinguals On Wikipedia, 15.4% of users (8,544) edited more than one language edition and were designated as multilingual users. Density plot compares the number of edits made by monolingual and multilingual Wikipedia users. Size of edits does not differ significantly. Scott A. Hale Design and Multilingual Users
  • 18. Wikipedia: Multilinguals vs Monolinguals On Wikipedia, 15.4% of users (8,544) edited more than one language edition and were designated as multilingual users. Density plot compares the number of edits made by monolingual and multilingual Wikipedia users. Size of edits does not differ significantly. Only 2.6% of edits are from users writing in their non-primary languages on Wikipedia. Scott A. Hale Design and Multilingual Users
  • 19. Twitter: Language and structure Label propagation algorithm (Raghavan, Albert, & Kumara, 2007) found 20,253 communities. Histograms of the size of communities (left) and the number of languages within each community (right). Modularity score of 0.81 for this community structure. Scott A. Hale Design and Multilingual Users
  • 20. Twitter: Language and structure Scatter plot of community size and the percentage of users in the community most often using the most prevalent language. Scott A. Hale Design and Multilingual Users
  • 21. Language and structure Most-used language % users in most-used language Number of languages Number of nodes Malay (ms) 78.3 41 123,616 English (en) 99.3 39 114,826 Portuguese (pt) 94.3 40 101,987 Japanese (ja) 99.6 19 83,785 English (en) 75.7 44 80,387 English (en) 55.1 42 37,688 Dutch (nl) 90.6 23 20,634 Table Clusters with over 10,000 nodes found through the label propagation algorithm. Collectively 61% of all users are in one of these clusters. Scott A. Hale Design and Multilingual Users
  • 22. Twitter: Do multilinguals bridge clusters? Size of the largest, weakly-connected component (left), total number of components (center), and average size of the components (right) created by removing all multilingual users, an equivalent number of monolingual users randomly, an equivalent number of all users randomly, and removing all multilingual users from a network with the same degree distribution but with edges randomly shuffled. Box plots show values from 100 realizations. Mean values are indicated with +. Scott A. Hale Design and Multilingual Users
  • 23. Wikipedia: Do multilinguals bridge editions? Do multilinguals edit similar articles across languages? A large number of users did not edit any of the same articles in their primary languages, but a large number of users also always edited the same articles in their primary languages. Scott A. Hale Design and Multilingual Users
  • 24. Wikipedia: Do multilinguals bridge editions? Do multilinguals edit similar articles across languages? A large number of users did not edit any of the same articles in their primary languages, but a large number of users also always edited the same articles in their primary languages. Scott A. Hale Design and Multilingual Users
  • 25. Variations by language Twitter Wikipedia Number of users in each language compared to the percentage of these users classified as multilingual. Scott A. Hale Design and Multilingual Users
  • 26. Twitter: Cross-language connections ar de en es fil fr gl it ja koms nl pt th Mentions and retweets across languages Nodes represent most-used language Directed, weighted edges show the log of the number of users primarily using one language who mention / retweet users in another language Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm N.B. This differs from the published paper where edges were normalized by the expected number of connections between language pairs if tweets were directed at users randomly without regard to language. Scott A. Hale Design and Multilingual Users
  • 27. Wikipedia: Language crossings ar bg ca cs da de en es fa fifr he hu id it ja ko nl no pl pt ro ru sv tr uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale Design and Multilingual Users
  • 28. Wikipedia: Language crossings (English removed) ca cs de es fr it ja nl pl pt ru sv uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale Design and Multilingual Users
  • 29. Summary and Implications Scott A. Hale Design and Multilingual Users Multilingualism correlated with activity on both platforms Design for multilingual users Allow users to have multiple preferred languages when personalizing search results, friend recommendations, etc.
  • 30. Summary and Implications Scott A. Hale Design and Multilingual Users Multilingualism correlated with activity on both platforms Design for multilingual users Allow users to have multiple preferred languages when personalizing search results, friend recommendations, etc. Structured by language Language has a strong role structuring both platforms Multilingual users in position to bridge clusters/editions, but mixed evidence on actual role Multilingual user percentage ∝ 1/self-focus bias
  • 31. Summary and Implications Scott A. Hale Design and Multilingual Users Multilingualism correlated with activity on both platforms Design for multilingual users Allow users to have multiple preferred languages when personalizing search results, friend recommendations, etc. Structured by language Language has a strong role structuring both platforms Multilingual users in position to bridge clusters/editions, but mixed evidence on actual role Multilingual user percentage ∝ 1/self-focus bias Important per language variations Users in less-represented languages more likely to cross-language boundaries on Wikipedia, but no correlation on Twitter. Platform differences? Consistent findings of English and Japanese as outliers
  • 32. Summary and Implications Scott A. Hale Design and Multilingual Users Multilingualism correlated with activity on both platforms Design for multilingual users Allow users to have multiple preferred languages when personalizing search results, friend recommendations, etc. Structured by language Language has a strong role structuring both platforms Multilingual users in position to bridge clusters/editions, but mixed evidence on actual role Multilingual user percentage ∝ 1/self-focus bias Important per language variations Users in less-represented languages more likely to cross-language boundaries on Wikipedia, but no correlation on Twitter. Platform differences? Consistent findings of English and Japanese as outliers Larger languages form bridges Especially English, but Other geolinguistic patterns evident Global connectivity results through the combination of multilinguals across many language pairs
  • 33. Design and Multilingual Users on Twitter and Wikipedia Scott A. Hale scott.hale@oii.ox.ac.uk http://www.scotthale.net/ Oxford Internet Institute University of Oxford 17 June 2014 Scott A. Hale Design and Multilingual Users I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall who provided helpful comments on various aspects of this research.
  • 34. Barnett, G. A., & Choi, Y. (1995). Physical Distance and Language as Determinants of the International Telecommunications Network. International Political Science Review, 16(3), 249–265. Available from http://ips.sagepub.com/content/16/3/249.abstract Birner, B. (2005). Bilingualism (Tech. Rep.). Washington, DC, USA: Linguistic Socieyt of America. Available from http://www.linguisticsociety.org/files/Bilingual.pdf Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks: How Multilingual Users of Twitter Connect Language Communities. Proceedings of the American Society for Information Science and Technology, 49(1), 1–4. Available from http://dx.doi.org/10.1002/meet.14504901327 Erard, M. (2012, January). Are we Really Monolingual? Available from http://www.nytimes.com/2012/01/15/opinion/sunday/ are-we-really-monolingual.html Scott A. Hale Design and Multilingual Users
  • 35. Graham, M., Hale, S. A., & Gaffney, D. (2013). Where in the world are you? Geolocation and language identification in Twitter. Professional Geographer. Hale, S. A. (2012a). Impact of platform design on cross-language information exchange. In Proceedings of the 2012 acm annual conference on human factors in computing systems extended abstracts (pp. 1363–1368). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/2212776.2212456 Hale, S. A. (2012b). Net Increase? Cross-Lingual Linking in the Blogosphere. Journal of Computer-Mediated Communication, 17(2), 135–151. Available from http://onlinelibrary.wiley.com/doi/ 10.1111/j.1083-6101.2011.01568.x/full Hale, S. A. (2014a). Global Connectivity and Multilinguals in the Twitter Network. In Proceedings of the sigchi conference on human factors in computing systems (pp. 833–842). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/2556288.2557203 Scott A. Hale Design and Multilingual Users
  • 36. Hale, S. A. (2014b). Multilinguals and Wikipedia Editing. In Proceedings of the 6th annual acm web science conference. New York, NY, USA: ACM. Available from http://arxiv.org/abs/1312.0976 Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on communities and technologies (pp. 11–20). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1556460.1556463 Hecht, B., & Gergle, D. (2010a). On the “localness” of user-generated content. In Proceedings of the 2010 acm conference on computer supported cooperative work (pp. 229–232). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1718918.1718962 Hecht, B., & Gergle, D. (2010b). The Tower of Babel meets Web 2.0: User-generated content and its applications in a multilingual context. In Proceedings of the 28th international conference on human factors in computing systems (pp. 291–300). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1753326.1753370 Scott A. Hale Design and Multilingual Users
  • 37. Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E., Stoerger, S., et al. (2007). Language Networks on LiveJournal. In Proceedings of the 40th annual hawaii international conference on system sciences. Washington, DC, USA: IEEE Computer Society. Available from http://dx.doi.org/10.1109/HICSS.2007.320 Nordenstreng, K., & Varis, T. (1974). Television traffic: A one-way street? A survey and analysis of the international flow of television programme material. Reports and Papers on Mass Communication(70). Raghavan, U. N., Albert, R., & Kumara, S. (2007, September). Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, 76(3), 36106. Available from http://link.aps.org/doi/10.1103/PhysRevE.76.036106 Takhteyev, Y., Gruzd, A., & Wellman, B. (2011). Geography of Twitter networks. Social Networks, 1–26. Available from http://www.sciencedirect.com/science/article/pii/ S0378873311000359#FCANote Scott A. Hale Design and Multilingual Users
  • 38. Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language and Internet diffusion patterns in Uzbekistan. New Review of Hypermedia and Multimedia, 11(2), 205–220. Wilkinson, D., & Thelwall, M. (2012). Trending Twitter topics in English: An international comparison. Journal of the American Society for Information Science and Technology, 63(8), 1631–1646. Available from http://dx.doi.org/10.1002/asi.22713 Zuckerman, E. (2008). Meet the bridgebloggers. Public Choice, 134(1), 47–65. Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of Connection. London: W. W. Norton & Company. Scott A. Hale Design and Multilingual Users