http://www.scotthale.net/pubs/?websci2014
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.
Global connectivity and multilinguals in the Twitter network (slides)Scott A. Hale
This article analyzes the global connectivity of the Twitter retweet and mentions network and the role of multilingual users engaging with content in multiple languages. The network is heavily structured by language with most mentions and retweets directed to users writing in the same language. Users writing in multiple languages are more active, authoring more tweets than monolingual users. These multilingual users play an important bridging role in the global connectivity of the network. The mean level of insularity from speakers in each language does not correlate straightforwardly with the size of the user base as predicted by previous research. Finally, the English language does play more of a bridging role than other languages, but the role played collectively by multilingual users across different languages is the largest bridging force in the network. Full paper at http://www.scotthale.net/pubs/?chi2014
eMargin Presentation given to Skills Funding AgencyRDUES
Presentation on the eMargin collaborative text annotation tool given to the Skills Funding Agency. Also contains description of AHRC Knowledge Transfer Fellowship project, working with A Level English Language students.
Okay, so the best way to find out something is to ask someone. But what's the best way to ask so that you get an answer to the question that you meant to ask and not to the question they thought that you asked? Join Kathryn Brockmeier, Nebraska Library Commission Research Analyst, for some tips and techniques for getting the information you need.
Editing Wikipedia: Why You Should and How You Can Support Your Userslisbk
Slides for a talk on "Editing Wikipedia: Why You Should and How You Can Support Your Users" to be given by Brian Kelly, Innovation Advocate at Cetis at the CILIP Wales 2014 conference in Cardiff on 15 May 2014.
See http://ukwebfocus.wordpress.com/events/cilip-wales-2014-editing-wikipedia/
and
http://ukwebfocus.wordpress.com/2014/05/14/top-wikipedia-tips-for-librarians/
Global connectivity and multilinguals in the Twitter network (slides)Scott A. Hale
This article analyzes the global connectivity of the Twitter retweet and mentions network and the role of multilingual users engaging with content in multiple languages. The network is heavily structured by language with most mentions and retweets directed to users writing in the same language. Users writing in multiple languages are more active, authoring more tweets than monolingual users. These multilingual users play an important bridging role in the global connectivity of the network. The mean level of insularity from speakers in each language does not correlate straightforwardly with the size of the user base as predicted by previous research. Finally, the English language does play more of a bridging role than other languages, but the role played collectively by multilingual users across different languages is the largest bridging force in the network. Full paper at http://www.scotthale.net/pubs/?chi2014
eMargin Presentation given to Skills Funding AgencyRDUES
Presentation on the eMargin collaborative text annotation tool given to the Skills Funding Agency. Also contains description of AHRC Knowledge Transfer Fellowship project, working with A Level English Language students.
Okay, so the best way to find out something is to ask someone. But what's the best way to ask so that you get an answer to the question that you meant to ask and not to the question they thought that you asked? Join Kathryn Brockmeier, Nebraska Library Commission Research Analyst, for some tips and techniques for getting the information you need.
Editing Wikipedia: Why You Should and How You Can Support Your Userslisbk
Slides for a talk on "Editing Wikipedia: Why You Should and How You Can Support Your Users" to be given by Brian Kelly, Innovation Advocate at Cetis at the CILIP Wales 2014 conference in Cardiff on 15 May 2014.
See http://ukwebfocus.wordpress.com/events/cilip-wales-2014-editing-wikipedia/
and
http://ukwebfocus.wordpress.com/2014/05/14/top-wikipedia-tips-for-librarians/
Design and Multilingual Users on Twitter and WikipediaScott A. Hale
Presentation given at MIT Media Lab on June 17, 2014. Presents ongoing work on design and multilingual users. Two recent papers are "Global Connectivity and Multilinguals in the Twitter Network" (http://www.scotthale.net/pubs/?chi2014) and "Multilinguals and Wikipedia Editing" (http://www.scotthale.net/pubs/?websci2014)
Your Global Audience is Already Here: How to Create Content that Communicates...Scott Abel
Presented by Ann Zdunczyk at Documentation and Training Life Sciences, June 23-26, 2008 in Indianapolis, IN.
English is one of the most expressive languages on Earth; with a vocabulary of over 900,000 words, no wonder there are so many ways to say the same thing! Mission critical, life saving messages must be communicated clearly in English as in target languages. Even if your content is still in “English only”, this presentation will give you insights to more effectively communicate your intent, in words and images, to a diverse audience. Find out what global forces are eroding market boundaries and helping “make the world flat,” broadening your future audience to include languages you may not have considered before.
This presentation will cover many considerations, including:
* Is your content written as clearly and as to the point as possible?
* Does your content use consistent terminology?
* Has your company acquired other subsidiary divisions that have different standards for writing and managing content and language translation? If so, how do your coordinate your efforts in this arena?
* How do you optimize source, English content to leverage as much previously translated text from legacy material as possible?
* How can a professional linguist be certain of your intent during translation?
* How can you validate content translated for overseas markets?
* When does “fancy” formatting and page layout become an impediment to language translation?
No doubt you’ve already heard about Controlled English, and the many challenges to effectively translating rich, technical content from English to other languages. At first glance, the task can seem overwhelming. Believe it or not, you are already “shifting gears” and writing at different levels of English for different audiences. The same skills you use every day in editing you own email can be transposed to effectively create focused, technical content for a broad global audience.
Domestically, a significant proportion of medical staff are non-native English speakers. In an emergency, all staff must instantly grasp the intent of written instructions on complex equipment. The “life-saving” ramifications of your content become even more pronounced when your words are translated from English to another language. Attend this session to learn even more ways to avoid errors and save lives. (And you thought you were just creating content!)
Multilingual user interface for website using resource fileseSAT Journals
Abstract Due to the rapid growth of the internet usage, new kinds of problems are emerging endlessly and one among them is language barrier among different Internet user, so it is important to build a system which will overcome this barrier. Some websites do provides language switching options but the solutions are not satisfactory as multiple pages need to be developed for multiple languages which involve lot of resource wastage. This paper discusses multilingual implementation of website using the concept of Resource file in Asp.Net. ASP.NET and the .NET Framework ship with support for multilingual applications, namely in the form of Resource Files. Multilingual User Interface (MUI) enables the localization of user interfaces for globalized applications. MUI also supports the creation of resources for any number of user interface languages. MUI ship single core functionality to all platforms independent of UI language, which significantly reduces development and testing efforts. The most visible benefit of MUI is that multiple users can share the same webpage and view the user interface in different languages. Multilingual accessibility to Website content significantly facilitates multilingual users to meet their needs. Keywords - Language Barriers, Multilingual Website, Resource File.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Writing aids (namely: spellchecker, thesaurus, hyphenation patterns, grammar checker) for OpenOffice.org can always be improved and streamlined. The best environment for a collaborative effort to create and improve such tools is the local Native-Lang community: you can get work done by people who use and appreciate OOo, and reward the community by making their work available in the following releases.
However, a number of issues must be solved to ensure success of such a community project. We will examine some of them, like: the technical expertise needed to build and maintain the single tools and extension packages; the licenses and legal reviews to get tools included in the official builds of OpenOffice.org; the management of roles within the community and effective delegation; the need for readily accessible and easy-to-use interfaces for newcomers who just want to quickly contribute an idea or improvement.
Increasing access to free and open knowledge for speakers of underserved lang...Lucie-Aimée Kaffee
Slides for the talk at the FOSDEM 2016 about Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia with help of Wikidata in the ArticlePlaceholder project
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This paper presents the method of applying speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in real time calling system. This technique recognizes spoken input, analyzes and translates it, and finally utters the translation. The major part of Speech translation comes under Natural language processing. Natural language processing is a branch of Artificial Intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages. Speech Translation involves techniques to translate the spoken sentences from one language to another. The major part of speech translation involves Speech Recognition which is the translation of spoken speech to text and identifying the context and linguistic structure of the input speech. In the current scenario, the machine does not identify whether the given word is in past tense or present tense. By using the algorithm, we search for a word to check if it is past or present by searching for the sub strings, as “ed”, ”had”, ”Done”, etc., This paper gives us an idea on working with API’s to translate the input speech to the required output speech and thus increasing the efficiency of Speech Translation in cellular devices and also a mobile application that will help us to monitor all the audios present in mobile device and translate it into required language.
Dr Scott A Hale introduced and facilitated discussion on the latest research updates and research needs at the Trusted Media Summit in December 2019. This summit brought together media organizations throughout APAC.
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
Dr Scott A Hale presented these slides at the 2019 News Impact Summit in Lyon, France, hosted by The European Journalism Centre and Google News Initiative
https://newsimpact.io/summits/news-impact-summit-lyon
Design and Multilingual Users on Twitter and WikipediaScott A. Hale
Presentation given at MIT Media Lab on June 17, 2014. Presents ongoing work on design and multilingual users. Two recent papers are "Global Connectivity and Multilinguals in the Twitter Network" (http://www.scotthale.net/pubs/?chi2014) and "Multilinguals and Wikipedia Editing" (http://www.scotthale.net/pubs/?websci2014)
Your Global Audience is Already Here: How to Create Content that Communicates...Scott Abel
Presented by Ann Zdunczyk at Documentation and Training Life Sciences, June 23-26, 2008 in Indianapolis, IN.
English is one of the most expressive languages on Earth; with a vocabulary of over 900,000 words, no wonder there are so many ways to say the same thing! Mission critical, life saving messages must be communicated clearly in English as in target languages. Even if your content is still in “English only”, this presentation will give you insights to more effectively communicate your intent, in words and images, to a diverse audience. Find out what global forces are eroding market boundaries and helping “make the world flat,” broadening your future audience to include languages you may not have considered before.
This presentation will cover many considerations, including:
* Is your content written as clearly and as to the point as possible?
* Does your content use consistent terminology?
* Has your company acquired other subsidiary divisions that have different standards for writing and managing content and language translation? If so, how do your coordinate your efforts in this arena?
* How do you optimize source, English content to leverage as much previously translated text from legacy material as possible?
* How can a professional linguist be certain of your intent during translation?
* How can you validate content translated for overseas markets?
* When does “fancy” formatting and page layout become an impediment to language translation?
No doubt you’ve already heard about Controlled English, and the many challenges to effectively translating rich, technical content from English to other languages. At first glance, the task can seem overwhelming. Believe it or not, you are already “shifting gears” and writing at different levels of English for different audiences. The same skills you use every day in editing you own email can be transposed to effectively create focused, technical content for a broad global audience.
Domestically, a significant proportion of medical staff are non-native English speakers. In an emergency, all staff must instantly grasp the intent of written instructions on complex equipment. The “life-saving” ramifications of your content become even more pronounced when your words are translated from English to another language. Attend this session to learn even more ways to avoid errors and save lives. (And you thought you were just creating content!)
Multilingual user interface for website using resource fileseSAT Journals
Abstract Due to the rapid growth of the internet usage, new kinds of problems are emerging endlessly and one among them is language barrier among different Internet user, so it is important to build a system which will overcome this barrier. Some websites do provides language switching options but the solutions are not satisfactory as multiple pages need to be developed for multiple languages which involve lot of resource wastage. This paper discusses multilingual implementation of website using the concept of Resource file in Asp.Net. ASP.NET and the .NET Framework ship with support for multilingual applications, namely in the form of Resource Files. Multilingual User Interface (MUI) enables the localization of user interfaces for globalized applications. MUI also supports the creation of resources for any number of user interface languages. MUI ship single core functionality to all platforms independent of UI language, which significantly reduces development and testing efforts. The most visible benefit of MUI is that multiple users can share the same webpage and view the user interface in different languages. Multilingual accessibility to Website content significantly facilitates multilingual users to meet their needs. Keywords - Language Barriers, Multilingual Website, Resource File.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Writing aids (namely: spellchecker, thesaurus, hyphenation patterns, grammar checker) for OpenOffice.org can always be improved and streamlined. The best environment for a collaborative effort to create and improve such tools is the local Native-Lang community: you can get work done by people who use and appreciate OOo, and reward the community by making their work available in the following releases.
However, a number of issues must be solved to ensure success of such a community project. We will examine some of them, like: the technical expertise needed to build and maintain the single tools and extension packages; the licenses and legal reviews to get tools included in the official builds of OpenOffice.org; the management of roles within the community and effective delegation; the need for readily accessible and easy-to-use interfaces for newcomers who just want to quickly contribute an idea or improvement.
Increasing access to free and open knowledge for speakers of underserved lang...Lucie-Aimée Kaffee
Slides for the talk at the FOSDEM 2016 about Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia with help of Wikidata in the ArticlePlaceholder project
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This paper presents the method of applying speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in real time calling system. This technique recognizes spoken input, analyzes and translates it, and finally utters the translation. The major part of Speech translation comes under Natural language processing. Natural language processing is a branch of Artificial Intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages. Speech Translation involves techniques to translate the spoken sentences from one language to another. The major part of speech translation involves Speech Recognition which is the translation of spoken speech to text and identifying the context and linguistic structure of the input speech. In the current scenario, the machine does not identify whether the given word is in past tense or present tense. By using the algorithm, we search for a word to check if it is past or present by searching for the sub strings, as “ed”, ”had”, ”Done”, etc., This paper gives us an idea on working with API’s to translate the input speech to the required output speech and thus increasing the efficiency of Speech Translation in cellular devices and also a mobile application that will help us to monitor all the audios present in mobile device and translate it into required language.
Similar to Multilinguals and Wikipedia Editing (20)
Dr Scott A Hale introduced and facilitated discussion on the latest research updates and research needs at the Trusted Media Summit in December 2019. This summit brought together media organizations throughout APAC.
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
Dr Scott A Hale presented these slides at the 2019 News Impact Summit in Lyon, France, hosted by The European Journalism Centre and Google News Initiative
https://newsimpact.io/summits/news-impact-summit-lyon
Foreign-language Reviews: Help or Hindrance? (Slides)Scott A. Hale
Full paper at http://scott.hale.us/pubs/?chi2017
The number and quality of user reviews greatly affects consumer purchasing decisions. While reviews in all languages are increasing, it is still often the case (especially for non-English speakers) that there are only a few reviews in a person’s first language. Using an online experiment, we examine the value that potential purchasers receive from interfaces showing additional reviews in a second language. The results paint a complicated picture with both positive and negative reactions to the inclusion of foreign-language reviews. Roughly 26–28\% of subjects clicked to see translations of the foreign-language content when given the opportunity, and those who did so were more likely to select the product with foreign-language reviews than those who did not.
Full paper at http://scott.hale.us/pubs/?chi2017
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...Scott A. Hale
Slides from the 2015 Web Science Conference presentation measuring how much can be and is said in microblog posts of different languages on Twitter and Weibo. Full paper at http://arxiv.org/abs/1506.00572
Mapping the UK Webspace: Fifteen Years of British Universities on the WebScott A. Hale
Full paper at http://arxiv.org/abs/1405.2856
This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.
http://arxiv.org/abs/1405.2856
Slides for a presentation on recent work with Web Archives at the Oxford Internet Institute (http://www.oii.ox.ac.uk/) given at WIRE2014 (http://wp.comminfo.rutgers.edu/nsfia/schedule/)
ECPR 2011 Leaders and Followers ExperimentScott A. Hale
Leadership without Leaders? Starters and Followers in On-line Collective Action. These slides are from a presentation at the ‘Collective Action’ panel 517,
ECPR Conference, Rejkavik, 26th August 2011. More information is available at http://www.governmentontheweb.org/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
1. Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
2. Background, Motivations
Wikipedia is global platform covering hundreds of languages
despite evidence of balkanization (Taneja & Wu, in press)
Past studies generally concentrate on one edition (usually English)
Important variations across languages
Content is diverse across languages (Hecht & Gergle, 2010)
Each edition of Wikipedia shows a self-focus bias with more articles
about regions where the language is spoken (Hecht & Gergle, 2009)
Multilingual users may act as unconscious translators bridging language
divides (Herring et al., 2007; Eleta & Golbeck, 2012)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
3. Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
4. Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety of
timezones suggesting that bilingual editors are present (Yasseri et al.,
2012)
In a survey of editors, half of all editors reported editing in multiple
languages and 72% reported reading more than one language edition of
Wikipedia.†
†
https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
5. Hypotheses
1 Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
6. Data
All edits to any of the top 46 language editions (all editions with at
least 100,000 articles)
Recorded via the IRC stream
(code at http://www.scotthale.net/pubs/?websci2014)
32 days (8 July to 9 August 2013)
Edit meta-data
datetime
edition
article title
username
size of edit
flags (minor, bot, etc.)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
7. Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users with a total of 3,518,955 edits (excluding the Simple English
edition).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
8. Data summary
Language Edits Articles Users NP
users
NP
edits
English 1,389,647 518,405 27,476 18% 3%
German 256,495 125,647 5,967 18% 2%
French 250,828 106,027 4,549 25% 3%
Spanish 191,934 66,848 4,338 24% 3%
Russian 239,267 92,326 3,961 16% 1%
Japanese 106,848 56,406 3,551 11% 2%
Italian 160,191 69,534 2,919 25% 2%
Chinese 112,888 42,937 2,309 14% 1%
Portuguese 67,505 32,753 1,730 29% 4%
Dutch 80,535 39,463 1,500 33% 3%
Polish 67,038 37,393 1,454 30% 3%
Top language editions: The Users column includes all users who edited the edition
during the data collection period. A percentage of these users (NP users) are
non-primary users who edited a different language edition more frequently.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
9. Multilinguals vs Monolinguals
15.4% of users (8,544) edited multiple language editions.
Figure: Density plot comparing the number of edits made by monolingual and
multilingual Wikipedia users.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
10. Hypotheses
Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
11. What do multilinguals edit?
Only 2.6% of edits are
from users writing in their
non-primary languages.
44% of the articles edited
by multilingual users in
their non-primary
languages were not edited
by any monolingual user
2D density plot of the number of multilingual
users editing articles in a non-primary language
against the number of monolingual users editing
the articles.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
12. What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages. The distribution is
bimodal. A large number of users did not edit any of the same articles in their
primary languages, but a large number of users always edited the same articles in
their primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
13. What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages after removing
edits to articles that do not exist in users’ primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
14. Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
15. Variations by language
Scatter plot of language size (number of unique users) and percentage of users who
are multilingual (edit more than one language edition). The three editions with less
than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
16. Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
17. Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
18. Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
19. Simple English
No big changes if Simple English edition is considered
Largest editor overlap with English edition
Dedicated group of editors:
45% of editors editing Simple most frequently do not edit any other
edition (similar to Esperanto)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
20. Comparison with Twitter
Similar percentages of users multilingual (11% in Twitter)
Similar correlation between activity level and multilingualism
Language size not correlated with multilingualism on Twitter;
some language consistencies (Japanese, English) and some variations
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.
http://www.scotthale.net/pubs/?chi2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
21. Implications and future directions
Implications
Multilingual users found in all
editions; correlation with activity
Design for multilingual users
(universal language selector and
global accounts already progress
in this direction)
Important per language
variations
Inverse correlation between
multilingual users and self-focus
bias as measured by Hecht
(2009)
Further work
Move from edit meta-data to
edit content itself
What type of edits are users
making in non-primary
languages?
Variations by topic/theme?
Correlations with link/image
overlap?
Viewing vs. editing behavior
(survey results show much higher
percentage of users read multiple
editions)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
22. Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as
well as the anonymous reviewers who provided helpful comments on previous versions of
this research.
23. Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:
Cambridge University Press.
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
24. Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia
Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.
Available from
http://dx.doi.org/10.1371%2Fjournal.pone.0030091
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
25. Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing