Presentation given at MIT Media Lab on June 17, 2014. Presents ongoing work on design and multilingual users. Two recent papers are "Global Connectivity and Multilinguals in the Twitter Network" (http://www.scotthale.net/pubs/?chi2014) and "Multilinguals and Wikipedia Editing" (http://www.scotthale.net/pubs/?websci2014)
Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Dis...David Laniado
Slides presented at UPF:
https://www.upf.edu/web/mdm-dtic/gender-and-wikipedia_2017
https://www.upf.edu/en/web/guest/home/-/asset_publisher/UI8Z8VAxU47P/content/id/7282941/maximized#.WIjEE2dA-Ba
The presentation focuses on two studies that investigate differences in language used by men and women on Wikipedia talk pages. Automatic message analysis reveals that women participate more in discussions that have a positive tone, and use a language that promotes more relationship and emotional connection compared to men. We also observe a gender difference in the leadership style: while men administrators tend to maintain an impersonal tone compared to other users, women administrators are indistinguishable from other women, and use a markedly emotional and relationship-oriented language. The results suggest the importance of communication style to address gender gap in online collaboration platforms, and to favor more welcoming environments capable of attracting and retaining users.
Global connectivity and multilinguals in the Twitter network (slides)Scott A. Hale
This article analyzes the global connectivity of the Twitter retweet and mentions network and the role of multilingual users engaging with content in multiple languages. The network is heavily structured by language with most mentions and retweets directed to users writing in the same language. Users writing in multiple languages are more active, authoring more tweets than monolingual users. These multilingual users play an important bridging role in the global connectivity of the network. The mean level of insularity from speakers in each language does not correlate straightforwardly with the size of the user base as predicted by previous research. Finally, the English language does play more of a bridging role than other languages, but the role played collectively by multilingual users across different languages is the largest bridging force in the network. Full paper at http://www.scotthale.net/pubs/?chi2014
http://www.scotthale.net/pubs/?websci2014
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.
LRC XIII Localisation Conference - Using community feedback to improve social...sarni
Unlike more traditional software like operating systems and office productivity suites, social networking technology and therefore terminology has developed rapidly over the last few years.
With more Web 2.0 applications featuring a strong social aspect, including the voice of the customer is becoming increasingly important in order to be able to capture the latest terminology used in the social networking domain. To be successful in tomorrow’s market-place, established businesses need to create business models that are inclusive of its customers while leveraging the global expertise and vast know-how and future potential they bring to the table.
To address this challenge, Microsoft has launched several initiatives to embrace end-users and the “community”. One of them is the “MTCF” terminology community engagement and feedback program designed to assess and improve the quality of localised Microsoft Messenger and Spaces terminology through community feedback with a focus on social networking terminology.
This presentation will cover lessons learned from the 1900+ terminology suggestions received across 29 EMEA languages during this feedback program. It will explore interesting observations from the community around existing terminology, implications for source terminology, the importance of style and “artistic license” in translation and challenges to existing - and often anecdotal - assumptions about terminology quality.
Multilingual user interface for website using resource fileseSAT Journals
Abstract Due to the rapid growth of the internet usage, new kinds of problems are emerging endlessly and one among them is language barrier among different Internet user, so it is important to build a system which will overcome this barrier. Some websites do provides language switching options but the solutions are not satisfactory as multiple pages need to be developed for multiple languages which involve lot of resource wastage. This paper discusses multilingual implementation of website using the concept of Resource file in Asp.Net. ASP.NET and the .NET Framework ship with support for multilingual applications, namely in the form of Resource Files. Multilingual User Interface (MUI) enables the localization of user interfaces for globalized applications. MUI also supports the creation of resources for any number of user interface languages. MUI ship single core functionality to all platforms independent of UI language, which significantly reduces development and testing efforts. The most visible benefit of MUI is that multiple users can share the same webpage and view the user interface in different languages. Multilingual accessibility to Website content significantly facilitates multilingual users to meet their needs. Keywords - Language Barriers, Multilingual Website, Resource File.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Project CASL (Conference ASL) was initiated to rely on widespread technology to craft a solution providing remote viewers a real-‐time web-‐based simultaneous ASL interpretation of TEDx livestreams.
Gender Gap in Collaborative Platforms: Language and emotions in Wikipedia Dis...David Laniado
Slides presented at UPF:
https://www.upf.edu/web/mdm-dtic/gender-and-wikipedia_2017
https://www.upf.edu/en/web/guest/home/-/asset_publisher/UI8Z8VAxU47P/content/id/7282941/maximized#.WIjEE2dA-Ba
The presentation focuses on two studies that investigate differences in language used by men and women on Wikipedia talk pages. Automatic message analysis reveals that women participate more in discussions that have a positive tone, and use a language that promotes more relationship and emotional connection compared to men. We also observe a gender difference in the leadership style: while men administrators tend to maintain an impersonal tone compared to other users, women administrators are indistinguishable from other women, and use a markedly emotional and relationship-oriented language. The results suggest the importance of communication style to address gender gap in online collaboration platforms, and to favor more welcoming environments capable of attracting and retaining users.
Global connectivity and multilinguals in the Twitter network (slides)Scott A. Hale
This article analyzes the global connectivity of the Twitter retweet and mentions network and the role of multilingual users engaging with content in multiple languages. The network is heavily structured by language with most mentions and retweets directed to users writing in the same language. Users writing in multiple languages are more active, authoring more tweets than monolingual users. These multilingual users play an important bridging role in the global connectivity of the network. The mean level of insularity from speakers in each language does not correlate straightforwardly with the size of the user base as predicted by previous research. Finally, the English language does play more of a bridging role than other languages, but the role played collectively by multilingual users across different languages is the largest bridging force in the network. Full paper at http://www.scotthale.net/pubs/?chi2014
http://www.scotthale.net/pubs/?websci2014
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.
LRC XIII Localisation Conference - Using community feedback to improve social...sarni
Unlike more traditional software like operating systems and office productivity suites, social networking technology and therefore terminology has developed rapidly over the last few years.
With more Web 2.0 applications featuring a strong social aspect, including the voice of the customer is becoming increasingly important in order to be able to capture the latest terminology used in the social networking domain. To be successful in tomorrow’s market-place, established businesses need to create business models that are inclusive of its customers while leveraging the global expertise and vast know-how and future potential they bring to the table.
To address this challenge, Microsoft has launched several initiatives to embrace end-users and the “community”. One of them is the “MTCF” terminology community engagement and feedback program designed to assess and improve the quality of localised Microsoft Messenger and Spaces terminology through community feedback with a focus on social networking terminology.
This presentation will cover lessons learned from the 1900+ terminology suggestions received across 29 EMEA languages during this feedback program. It will explore interesting observations from the community around existing terminology, implications for source terminology, the importance of style and “artistic license” in translation and challenges to existing - and often anecdotal - assumptions about terminology quality.
Multilingual user interface for website using resource fileseSAT Journals
Abstract Due to the rapid growth of the internet usage, new kinds of problems are emerging endlessly and one among them is language barrier among different Internet user, so it is important to build a system which will overcome this barrier. Some websites do provides language switching options but the solutions are not satisfactory as multiple pages need to be developed for multiple languages which involve lot of resource wastage. This paper discusses multilingual implementation of website using the concept of Resource file in Asp.Net. ASP.NET and the .NET Framework ship with support for multilingual applications, namely in the form of Resource Files. Multilingual User Interface (MUI) enables the localization of user interfaces for globalized applications. MUI also supports the creation of resources for any number of user interface languages. MUI ship single core functionality to all platforms independent of UI language, which significantly reduces development and testing efforts. The most visible benefit of MUI is that multiple users can share the same webpage and view the user interface in different languages. Multilingual accessibility to Website content significantly facilitates multilingual users to meet their needs. Keywords - Language Barriers, Multilingual Website, Resource File.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Project CASL (Conference ASL) was initiated to rely on widespread technology to craft a solution providing remote viewers a real-‐time web-‐based simultaneous ASL interpretation of TEDx livestreams.
To what extent do people tweet in other languages beyond English?
How do lingua groups interact with each other?
Is there an effect of language over online user interaction?
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Dr Scott A Hale introduced and facilitated discussion on the latest research updates and research needs at the Trusted Media Summit in December 2019. This summit brought together media organizations throughout APAC.
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
Dr Scott A Hale presented these slides at the 2019 News Impact Summit in Lyon, France, hosted by The European Journalism Centre and Google News Initiative
https://newsimpact.io/summits/news-impact-summit-lyon
More Related Content
Similar to Design and Multilingual Users on Twitter and Wikipedia
To what extent do people tweet in other languages beyond English?
How do lingua groups interact with each other?
Is there an effect of language over online user interaction?
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Similar to Design and Multilingual Users on Twitter and Wikipedia (20)
Dr Scott A Hale introduced and facilitated discussion on the latest research updates and research needs at the Trusted Media Summit in December 2019. This summit brought together media organizations throughout APAC.
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
Dr Scott A Hale presented these slides at the 2019 News Impact Summit in Lyon, France, hosted by The European Journalism Centre and Google News Initiative
https://newsimpact.io/summits/news-impact-summit-lyon
Foreign-language Reviews: Help or Hindrance? (Slides)Scott A. Hale
Full paper at http://scott.hale.us/pubs/?chi2017
The number and quality of user reviews greatly affects consumer purchasing decisions. While reviews in all languages are increasing, it is still often the case (especially for non-English speakers) that there are only a few reviews in a person’s first language. Using an online experiment, we examine the value that potential purchasers receive from interfaces showing additional reviews in a second language. The results paint a complicated picture with both positive and negative reactions to the inclusion of foreign-language reviews. Roughly 26–28\% of subjects clicked to see translations of the foreign-language content when given the opportunity, and those who did so were more likely to select the product with foreign-language reviews than those who did not.
Full paper at http://scott.hale.us/pubs/?chi2017
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...Scott A. Hale
Slides from the 2015 Web Science Conference presentation measuring how much can be and is said in microblog posts of different languages on Twitter and Weibo. Full paper at http://arxiv.org/abs/1506.00572
Mapping the UK Webspace: Fifteen Years of British Universities on the WebScott A. Hale
Full paper at http://arxiv.org/abs/1405.2856
This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.
http://arxiv.org/abs/1405.2856
Slides for a presentation on recent work with Web Archives at the Oxford Internet Institute (http://www.oii.ox.ac.uk/) given at WIRE2014 (http://wp.comminfo.rutgers.edu/nsfia/schedule/)
ECPR 2011 Leaders and Followers ExperimentScott A. Hale
Leadership without Leaders? Starters and Followers in On-line Collective Action. These slides are from a presentation at the ‘Collective Action’ panel 517,
ECPR Conference, Rejkavik, 26th August 2011. More information is available at http://www.governmentontheweb.org/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Machine learning and optimization techniques for electrical drives.pptx
Design and Multilingual Users on Twitter and Wikipedia
1. Design and Multilingual Users
on Twitter and Wikipedia
Scott A. Hale
scott.hale@oii.ox.ac.uk
http://www.scotthale.net/
Oxford Internet Institute
University of Oxford
17 June 2014
Scott A. Hale Design and Multilingual Users
4. Content is diverse across languages
“multilingualism...[is] the norm for most of the world’s societies” (Birner,
2005), with over half of Europe and over a fifth of the US multilingual
(Erard, 2012); yet, many platforms are designed only with monolingual users
in mind.
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale Design and Multilingual Users
5. Content is diverse across languages
“multilingualism...[is] the norm for most of the world’s societies” (Birner,
2005), with over half of Europe and over a fifth of the US multilingual
(Erard, 2012); yet, many platforms are designed only with monolingual users
in mind.
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Users often contribute local content/knowledge (Hecht & Gergle,
2010a)
Large diversity in information between languages (Hecht & Gergle,
2010b)
Can lead to self-focus bias (Hecht & Gergle, 2009)
Scott A. Hale Design and Multilingual Users
6. Motivations
Language clustering vs. small-worlds
Users thought to cluster by language in most online platforms (Barnett
& Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng
& Varis, 1974; Takhteyev, Gruzd, & Wellman, 2011; Wilkinson &
Thelwall, 2012)
Many online platforms thought to exhibit the ‘small-world’ phenomenon
of small path lengths between users (despite high clustering)
Scott A. Hale Design and Multilingual Users
7. Motivations
Language clustering vs. small-worlds
Users thought to cluster by language in most online platforms (Barnett
& Choi, 1995; Hale, 2012a, 2012b; Herring et al., 2007; Nordenstreng
& Varis, 1974; Takhteyev et al., 2011; Wilkinson & Thelwall, 2012)
Many online platforms thought to exhibit the ‘small-world’ phenomenon
of small path lengths between users (despite high clustering)
Role of multilingual users
⇒ If users cluster by language and platforms are small-worlds, there must
be brokers bridging different language groups (spanning structural
holes)
Multilingual users are possible bridge users. Only one study
investigating this: Ego-net level study on Twitter following–follower
network structure (Eleta & Golbeck, 2012).
No study multiplatform study, no study at large-scale level
Scott A. Hale Design and Multilingual Users
8. Outline
What are the roles of multilinguals and platform design in shaping the
spread of information in social media?
Twitter and Wikipedia at a global level
1 Language will have strong role in structuring the platform
2 Users engaging with content in multiple languages (multilingual users)
serve as bridges between different clusters/editions
3 Users primarily writing in less-represented languages will be more likely
to cross-language boundaries than users writing in highly-represented
languages
4 When users cross languages they will cross to larger languages (e.g.
English) and thus at a language level English will form more bridges
than other other languages
Scott A. Hale Design and Multilingual Users
9. Data
Twitter
Twitter mentions, retweet
network
18 days of ‘spritzer’ 1% sample
stream from June 2011
7,341,271 nodes. 8,545,693
directed, weighted edges
Wikipedia
Edits from top 46 language
editions
8 July to 9 August 2013
3.5 million non-minor edits by
55,568 registered users
Global Connectivity and Multilinguals in the Twitter Network (2014).
http://www.scotthale.net/pubs/?chi2014
Multilinguals and Wikipedia Editing (2014).
http://www.scotthale.net/pubs/?websci2014
Scott A. Hale Design and Multilingual Users
10. Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham,
Hale, & Gaffney, 2013)
Scott A. Hale Design and Multilingual Users
11. Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Scott A. Hale Design and Multilingual Users
12. Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Bots and spam users
Remove users with no mentions
(indegree=0)
Select only the largest
weakly-connected component
(88% of nodes)
Scott A. Hale Design and Multilingual Users
13. Twitter: Data cleaning
Language classification
Clean text of tweets for language
detection (remove urls,
usernames, emoticons)
Use Chromium Compact
Language Detection kit for
language detection (Graham et
al., 2013)
Remove users with less than 2
tweets or 20% of the user’s
tweets in one language
Remove users with less than four
tweets total
Bots and spam users
Remove users with no mentions
(indegree=0)
Select only the largest
weakly-connected component
(88% of nodes)
End result
916,836 nodes (users) and 2,652,618
directed edges (mentions/retweets)
Each user assigned most used
language and frequency [0-1] that
the most used language is used
Scott A. Hale Design and Multilingual Users
14. Wikipedia: Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users (excluding Simple English edition) with a total of 3,518,955
edits.
Scott A. Hale Design and Multilingual Users
15. User counts
Twitter
Language User Count
English (en) 375,474
Japanese (ja) 137,263
Portuguese (pt) 133,501
Malay/Indonesian (ms) 106,223
Spanish (es) 70,246
Dutch (nl) 31,035
Korean (ko) 16,123
Thai (th) 8,629
Arabic (ar) 7,679
French (fr) 5,769
Filipino/Tagalog (fil) 5,393
Wikipedia
Language User Count
English 22,412
German 4,920
French 3,430
Russian 3,330
Spanish 3,299
Japanese 3,164
Italian 2,202
Chinese 1,975
Portuguese 1,220
Polish 1,011
Dutch 1,007
Scott A. Hale Design and Multilingual Users
16. Twitter: Multilinguals vs Monolinguals
On Twitter, 11% of users (˜103,000) were observed to use more than one
language and designated as multilingual users.
Multilingual vs. monolingual users: Comparison of tweet count, out-degree, and
in-degree.
Scott A. Hale Design and Multilingual Users
17. Wikipedia: Multilinguals vs Monolinguals
On Wikipedia, 15.4% of users
(8,544) edited more than one
language edition and were
designated as multilingual users.
Density plot compares the
number of edits made by
monolingual and multilingual
Wikipedia users. Size of edits
does not differ significantly.
Scott A. Hale Design and Multilingual Users
18. Wikipedia: Multilinguals vs Monolinguals
On Wikipedia, 15.4% of users
(8,544) edited more than one
language edition and were
designated as multilingual users.
Density plot compares the
number of edits made by
monolingual and multilingual
Wikipedia users. Size of edits
does not differ significantly.
Only 2.6% of edits are from users
writing in their non-primary
languages on Wikipedia.
Scott A. Hale Design and Multilingual Users
19. Twitter: Language and structure
Label propagation algorithm (Raghavan, Albert, & Kumara, 2007) found
20,253 communities.
Histograms of the size of communities (left) and the number of languages within
each community (right). Modularity score of 0.81 for this community structure.
Scott A. Hale Design and Multilingual Users
20. Twitter: Language and structure
Scatter plot of community size and
the percentage of users in the
community most often using the most
prevalent language.
Scott A. Hale Design and Multilingual Users
21. Language and structure
Most-used
language
% users
in most-used
language
Number of
languages
Number of
nodes
Malay (ms) 78.3 41 123,616
English (en) 99.3 39 114,826
Portuguese (pt) 94.3 40 101,987
Japanese (ja) 99.6 19 83,785
English (en) 75.7 44 80,387
English (en) 55.1 42 37,688
Dutch (nl) 90.6 23 20,634
Table Clusters with over 10,000 nodes found through the label propagation
algorithm. Collectively 61% of all users are in one of these clusters.
Scott A. Hale Design and Multilingual Users
22. Twitter: Do multilinguals bridge clusters?
Size of the largest, weakly-connected component (left), total number of components
(center), and average size of the components (right) created by removing all
multilingual users, an equivalent number of monolingual users randomly, an
equivalent number of all users randomly, and removing all multilingual users from a
network with the same degree distribution but with edges randomly shuffled. Box
plots show values from 100 realizations. Mean values are indicated with +.
Scott A. Hale Design and Multilingual Users
23. Wikipedia: Do multilinguals bridge editions?
Do multilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primary
languages, but a large number of users also always edited the same articles in their
primary languages.
Scott A. Hale Design and Multilingual Users
24. Wikipedia: Do multilinguals bridge editions?
Do multilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primary
languages, but a large number of users also always edited the same articles in their
primary languages.
Scott A. Hale Design and Multilingual Users
25. Variations by language
Twitter Wikipedia
Number of users in each language compared to the percentage of these users
classified as multilingual.
Scott A. Hale Design and Multilingual Users
26. Twitter: Cross-language connections
ar
de
en
es
fil
fr
gl
it
ja
koms
nl
pt
th
Mentions and retweets across
languages
Nodes represent most-used
language
Directed, weighted edges show
the log of the number of users
primarily using one language who
mention / retweet users in
another language
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
N.B. This differs from the published paper where edges were normalized by the expected number of connections between language
pairs if tweets were directed at users randomly without regard to language.
Scott A. Hale Design and Multilingual Users
27. Wikipedia: Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale Design and Multilingual Users
28. Wikipedia: Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale Design and Multilingual Users
29. Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
30. Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
31. Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
Important per language variations
Users in less-represented languages
more likely to cross-language
boundaries on Wikipedia, but no
correlation on Twitter.
Platform differences?
Consistent findings of English
and Japanese as outliers
32. Summary and Implications
Scott A. Hale Design and Multilingual Users
Multilingualism correlated with
activity on both platforms
Design for multilingual users
Allow users to have multiple
preferred languages when
personalizing search results,
friend recommendations, etc.
Structured by language
Language has a strong role
structuring both platforms
Multilingual users in position to
bridge clusters/editions, but
mixed evidence on actual role
Multilingual user percentage ∝
1/self-focus bias
Important per language variations
Users in less-represented languages
more likely to cross-language
boundaries on Wikipedia, but no
correlation on Twitter.
Platform differences?
Consistent findings of English
and Japanese as outliers
Larger languages form bridges
Especially English, but
Other geolinguistic patterns
evident
Global connectivity results
through the combination of
multilinguals across many
language pairs
33. Design and Multilingual Users
on Twitter and Wikipedia
Scott A. Hale
scott.hale@oii.ox.ac.uk
http://www.scotthale.net/
Oxford Internet Institute
University of Oxford
17 June 2014
Scott A. Hale Design and Multilingual Users
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall
who provided helpful comments on various aspects of this research.
34. Barnett, G. A., & Choi, Y. (1995). Physical Distance and Language as
Determinants of the International Telecommunications Network.
International Political Science Review, 16(3), 249–265. Available from
http://ips.sagepub.com/content/16/3/249.abstract
Birner, B. (2005). Bilingualism (Tech. Rep.). Washington, DC, USA:
Linguistic Socieyt of America. Available from
http://www.linguisticsociety.org/files/Bilingual.pdf
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Erard, M. (2012, January). Are we Really Monolingual? Available from
http://www.nytimes.com/2012/01/15/opinion/sunday/
are-we-really-monolingual.html
Scott A. Hale Design and Multilingual Users
35. Graham, M., Hale, S. A., & Gaffney, D. (2013). Where in the world are
you? Geolocation and language identification in Twitter. Professional
Geographer.
Hale, S. A. (2012a). Impact of platform design on cross-language
information exchange. In Proceedings of the 2012 acm annual
conference on human factors in computing systems extended abstracts
(pp. 1363–1368). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/2212776.2212456
Hale, S. A. (2012b). Net Increase? Cross-Lingual Linking in the
Blogosphere. Journal of Computer-Mediated Communication, 17(2),
135–151. Available from http://onlinelibrary.wiley.com/doi/
10.1111/j.1083-6101.2011.01568.x/full
Hale, S. A. (2014a). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Scott A. Hale Design and Multilingual Users
36. Hale, S. A. (2014b). Multilinguals and Wikipedia Editing. In Proceedings of
the 6th annual acm web science conference. New York, NY, USA:
ACM. Available from http://arxiv.org/abs/1312.0976
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Hecht, B., & Gergle, D. (2010a). On the “localness” of user-generated
content. In Proceedings of the 2010 acm conference on computer
supported cooperative work (pp. 229–232). New York, NY, USA:
ACM. Available from
http://doi.acm.org/10.1145/1718918.1718962
Hecht, B., & Gergle, D. (2010b). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Scott A. Hale Design and Multilingual Users
37. Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Nordenstreng, K., & Varis, T. (1974). Television traffic: A one-way street?
A survey and analysis of the international flow of television programme
material. Reports and Papers on Mass Communication(70).
Raghavan, U. N., Albert, R., & Kumara, S. (2007, September). Near linear
time algorithm to detect community structures in large-scale networks.
Phys. Rev. E, 76(3), 36106. Available from
http://link.aps.org/doi/10.1103/PhysRevE.76.036106
Takhteyev, Y., Gruzd, A., & Wellman, B. (2011). Geography of Twitter
networks. Social Networks, 1–26. Available from
http://www.sciencedirect.com/science/article/pii/
S0378873311000359#FCANote
Scott A. Hale Design and Multilingual Users
38. Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Wilkinson, D., & Thelwall, M. (2012). Trending Twitter topics in English:
An international comparison. Journal of the American Society for
Information Science and Technology, 63(8), 1631–1646. Available
from http://dx.doi.org/10.1002/asi.22713
Zuckerman, E. (2008). Meet the bridgebloggers. Public Choice, 134(1),
47–65.
Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale Design and Multilingual Users