SlideShare a Scribd company logo
Han-Teng Liao defended his PhD
successfully at the Oxford
Internet Institute (OII) July 2014.
His research focus in is on user-
generated content and data, Web
analytics (webometrics), Chinese
Internet Research and integrated
digital research designs (both
qualitative and quantitative).
Thomas Petzold is a social technology analyst, TED
speaker and professor of media management at HMKW
– University of Applied Sciences for Media,
Communication and Management in Berlin, Germany.
As a research fellow at the WZB (2011–2013), he led a
project on languages and big data in social technology.
[photo: David Ausserhofer]
Abstract
What is Data Normalization?
Finer normalization: geolinguistic unit
A language tag:
• Often starts with a language code followed by a country code
• e.g. “fr‐CA” = the geolinguistic unit of French used in Canada. 
• has corresponding data points in the Unicode’s Common Locale 
Data Repository (CLDR) Project. 
• e.g. “fr‐CA” = 7,605,004 [12]
Finer geolinguistic data normalization is useful …
• for finer comparison between, say, Egyptian Arabic and Saudi 
Arabia Arabic speakers, or that of Spanish Spanish and Mexican 
Spanish speakers
• for analysts or designers to better know and thus support their 
users by to providing appropriate interfaces and content[7]
• for better understanding of the Wikipedia traffic data
References (partial: those mentioned in this poster)
[1] American Planning Association 2006.
Planning and Urban Design Standards.
John Wiley & Sons.
[2] Cote, P. Effective Cartography:
Mapping with Quantitative Data. Harvard
Graduate School of Design.
[3] Crowston, K. et al. 2013. Sustainability
of Open Collaborative Communities:
Analyzing Recruitment Efficiency.
Technology Innovation Management
Review. January: Open Source
Sustainability (2013).
Acknowledgments
We appreciate the Wikimedia UK for the scholarship for Han‐Teng 
Liao to present the findings at the OpenSym 2014. We also 
acknowledge the open source software tools called Scrapy for 
making the web mining tasks easier.
Data normalization, or geographic normalization, allows data to be 
compared using a sensible common denominator, thereby 
producing measurements of intensity or density, such as 
population density [1, 2]
Data normalization is useful …
• in “factoring out the size” in order to facilitate comparisons 
across unequal areas or populations [2]
• in dividing a certain numeric attribute (e.g. GDP) 
by another (e.g. population), and
so as to derive another numeric attribute (e.g. GDP per capita)
• in minimizing the differences caused by the size of a geographic 
unit
It is similar to Crowston, Julien and Ortega[3] in “factoring out the 
size” but different in the choice of size unit.
• Crowston et al’s work[3] have proposed a measurement to 
compare how efficient a language version turns potential users 
into actual contributors.
• They found “a strong (but not perfect) correlation” between 
the total number of Wikipedia contributors on one side, and 
the Internet population, and total tertiary‐educated population 
on the other. 
Han‐Teng Liao (hanteng@gmail.com) and Thomas Petzold (t.petzold@hmkw.de)
Towards a better understanding of the geolinguistic dynamics of knowledge
Geographic And Linguistic Normalization
OpenSym '14 , Aug 27‐29 2014,
Berlin, Germany
ACM 978‐1‐4503‐3016‐9/14/08.
http://dx.doi.org/10.1145/26415
80.2641623
We propose a method of geo‐linguistic normalization to advance 
the existing comparative analysis of open collaborative 
communities, with multilingual Wikipedia projects as the example. 
Such normalization requires data regarding the potential users 
and/or resources of a geolinguistic unit.
0%
20%
40%
60%
80%
Percent of the traffic
Year/Month
pgViews_perLang
Egypt
Saudi Arabia
Other
Algeria
0
2
4
6
8
Normalized by language 
population
Year/Month
pgViews_perLang
Israel
Kuwait
Saudi Arabia
UAE
Jordan
Bahrain
Qatar
Egypt
Figure 1. Viewing traffic trend lines Figure 3. Normalized viewing traffic trend lines
Comparing results: before and after data normalization
Arabic Wikipedia viewing traffic
Arabic Wikipedia editing traffic?
Please refer to the extended abstract or ask the authors for more 
(Figure 2 and Figure 4).
English Wikipedia editing traffic
English Wikipedia viewing traffic
Figure EN1. Viewing traffic trend lines Figure EN3. Normalized viewing traffic trend lines
0%
10%
20%
30%
40%
50%
Percent of the traffic
Year/Month
pgViews_perLang
United States
Other
United
Kingdom
Canada
0
0.5
1
1.5
2
Normalized by language 
population
Year/Month
pgViews_perLang
Canada
United Kingdom
New Zealand
Australia
Ireland
United States
Malaysia
Netherlands
Spain
Italy
France
Germany
Figure EN2. Editing traffic trend lines Figure EN4. Normalized editing traffic trend lines
0%
10%
20%
30%
40%
50%
Percent of the traffic
Year/Month
pgEdits_perLang
United States
United
Kingdom
Other
Canada
0
0.5
1
1.5
2
2.5Normalized by language 
population
Year/Month
pgEdits_perLang
United
Kingdom
New
Zealand
Canada
Ireland
Australia
[7] Liao, H.-T. 2013. How does localization
influence online visibility of user-generated
encyclopedias? A case study on Chinese-
language Search Engine Result Pages
(SERPs). Proceedings of the 9th
International Symposium on Open
Collaboration (Hong Kong, Aug. 2013).
[12]Unicode Consortium 2014. Language-
Territory Information, CLDR Version 25.

More Related Content

What's hot

International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
Call for papers - International Journal on Natural Language Computing (IJNLC)
Call for papers - International Journal on Natural Language Computing (IJNLC)Call for papers - International Journal on Natural Language Computing (IJNLC)
Call for papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
call for papers - International Journal on Natural Language Computing (IJNLC)
call for papers - International Journal on Natural Language Computing (IJNLC)call for papers - International Journal on Natural Language Computing (IJNLC)
call for papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
kevig
 
Issues in Developing Home Based Businesses
Issues in Developing Home Based BusinessesIssues in Developing Home Based Businesses
Issues in Developing Home Based Businesses
adil raja
 
Open learning spaces_twitter.ppt
Open learning spaces_twitter.pptOpen learning spaces_twitter.ppt
Open learning spaces_twitter.ppt
Peter Evans
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Micah Altman
 
MS-Word.doc
MS-Word.docMS-Word.doc
MS-Word.docbutest
 
HybridDocs - A Digital Learning Environment based on FlashCards
HybridDocs - A Digital Learning Environment based on FlashCardsHybridDocs - A Digital Learning Environment based on FlashCards
HybridDocs - A Digital Learning Environment based on FlashCards
Christian Heise
 
Rethinking academic publishing through multimedia scholarship
Rethinking academic publishing through multimedia scholarshipRethinking academic publishing through multimedia scholarship
Rethinking academic publishing through multimedia scholarship
Cheryl Ball
 
Dataset Quality Ontology - An Engineering Experience
Dataset Quality Ontology - An Engineering ExperienceDataset Quality Ontology - An Engineering Experience
Dataset Quality Ontology - An Engineering Experience
jerdeb
 
Humanities as Data: Projects, Visualizations, and Emerging Methods
Humanities as Data: Projects, Visualizations, and Emerging MethodsHumanities as Data: Projects, Visualizations, and Emerging Methods
Humanities as Data: Projects, Visualizations, and Emerging Methods
kfendt
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
tobold
 

What's hot (20)

International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
Call for papers - International Journal on Natural Language Computing (IJNLC)
Call for papers - International Journal on Natural Language Computing (IJNLC)Call for papers - International Journal on Natural Language Computing (IJNLC)
Call for papers - International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
call for papers - International Journal on Natural Language Computing (IJNLC)
call for papers - International Journal on Natural Language Computing (IJNLC)call for papers - International Journal on Natural Language Computing (IJNLC)
call for papers - International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)International Journal on Natural Language Computing (IJNLC)
International Journal on Natural Language Computing (IJNLC)
 
Issues in Developing Home Based Businesses
Issues in Developing Home Based BusinessesIssues in Developing Home Based Businesses
Issues in Developing Home Based Businesses
 
Open learning spaces_twitter.ppt
Open learning spaces_twitter.pptOpen learning spaces_twitter.ppt
Open learning spaces_twitter.ppt
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
 
MS-Word.doc
MS-Word.docMS-Word.doc
MS-Word.doc
 
HybridDocs - A Digital Learning Environment based on FlashCards
HybridDocs - A Digital Learning Environment based on FlashCardsHybridDocs - A Digital Learning Environment based on FlashCards
HybridDocs - A Digital Learning Environment based on FlashCards
 
Rethinking academic publishing through multimedia scholarship
Rethinking academic publishing through multimedia scholarshipRethinking academic publishing through multimedia scholarship
Rethinking academic publishing through multimedia scholarship
 
Dataset Quality Ontology - An Engineering Experience
Dataset Quality Ontology - An Engineering ExperienceDataset Quality Ontology - An Engineering Experience
Dataset Quality Ontology - An Engineering Experience
 
Humanities as Data: Projects, Visualizations, and Emerging Methods
Humanities as Data: Projects, Visualizations, and Emerging MethodsHumanities as Data: Projects, Visualizations, and Emerging Methods
Humanities as Data: Projects, Visualizations, and Emerging Methods
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 

Viewers also liked

Liao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalizationLiao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalization
Hanteng Liao
 
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
Hanteng Liao
 
201309 geo-linguistic dynamics virtual work liao IS1202 Malta
201309 geo-linguistic dynamics virtual work liao IS1202 Malta201309 geo-linguistic dynamics virtual work liao IS1202 Malta
201309 geo-linguistic dynamics virtual work liao IS1202 Malta
Hanteng Liao
 
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
Hanteng Liao
 
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
andrewchadwick
 
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
Hanteng Liao
 
[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice
Hanteng Liao
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
Kirsty Hulse
 

Viewers also liked (8)

Liao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalizationLiao and petzold opensym berlin wikipedia geolinguistic normalization
Liao and petzold opensym berlin wikipedia geolinguistic normalization
 
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...
 
201309 geo-linguistic dynamics virtual work liao IS1202 Malta
201309 geo-linguistic dynamics virtual work liao IS1202 Malta201309 geo-linguistic dynamics virtual work liao IS1202 Malta
201309 geo-linguistic dynamics virtual work liao IS1202 Malta
 
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
 
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
Andrew Chadwick and Simon Collister (2014) "Boundary-Drawing Power and the Re...
 
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...
 
[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Similar to Geographic and linguistic normalization opensym2014 poster

EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...
EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...
EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...Harald Wahl
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Christoph Lange
 
Semi-automatic Text MiningNK
Semi-automatic Text MiningNKSemi-automatic Text MiningNK
Semi-automatic Text MiningNKbutest
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
Harald Wahl
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Wolfgang Reinhardt
 
Lit mtap
Lit mtapLit mtap
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Christoph Lange
 
Lebenslauf_2016Q2_International_ver
Lebenslauf_2016Q2_International_verLebenslauf_2016Q2_International_ver
Lebenslauf_2016Q2_International_verYe Tian
 
P1151439345
P1151439345P1151439345
P1151439345
Ashraf Aboshosha
 
Eportfolio
EportfolioEportfolio
Eportfolio
Debopriyo Roy
 
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
Lisa Muthukumar
 
Sci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetSci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetHan Woo PARK
 
Advanced Community Information Systems Group (ACIS) Annual Report 2013
Advanced Community Information Systems Group (ACIS) Annual Report 2013Advanced Community Information Systems Group (ACIS) Annual Report 2013
Advanced Community Information Systems Group (ACIS) Annual Report 2013
Ralf Klamma
 
EL-7010 Week 1 Assignment: Online Learning for the K-12 Students
EL-7010 Week 1 Assignment: Online Learning for the K-12 StudentsEL-7010 Week 1 Assignment: Online Learning for the K-12 Students
EL-7010 Week 1 Assignment: Online Learning for the K-12 Students
eckchela
 
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Danube University Krems, Centre for E-Governance
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
Adriana Wilson
 
jair.1.12918_3.pdf certified gando ho tum salo
jair.1.12918_3.pdf certified gando ho tum salojair.1.12918_3.pdf certified gando ho tum salo
jair.1.12918_3.pdf certified gando ho tum salo
MUHAMMADARSLAN996046
 
Research Careers in Applied Computer Science
Research Careers in Applied Computer ScienceResearch Careers in Applied Computer Science
Research Careers in Applied Computer Science
Christoph Lange
 

Similar to Geographic and linguistic normalization opensym2014 poster (20)

EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...
EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...
EdMedia2013 - Educational Impacts of the Intelligent Integrated Computer-Assi...
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
 
Semi-automatic Text MiningNK
Semi-automatic Text MiningNKSemi-automatic Text MiningNK
Semi-automatic Text MiningNK
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...
 
Lit mtap
Lit mtapLit mtap
Lit mtap
 
DH2012_Bellamy
DH2012_BellamyDH2012_Bellamy
DH2012_Bellamy
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
 
Lebenslauf_2016Q2_International_ver
Lebenslauf_2016Q2_International_verLebenslauf_2016Q2_International_ver
Lebenslauf_2016Q2_International_ver
 
P1151439345
P1151439345P1151439345
P1151439345
 
Eportfolio
EportfolioEportfolio
Eportfolio
 
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
An Evaluation Of Educational Web-Sites From The Perspective Of Perception-Ori...
 
Sci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetSci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loet
 
Advanced Community Information Systems Group (ACIS) Annual Report 2013
Advanced Community Information Systems Group (ACIS) Annual Report 2013Advanced Community Information Systems Group (ACIS) Annual Report 2013
Advanced Community Information Systems Group (ACIS) Annual Report 2013
 
EL-7010 Week 1 Assignment: Online Learning for the K-12 Students
EL-7010 Week 1 Assignment: Online Learning for the K-12 StudentsEL-7010 Week 1 Assignment: Online Learning for the K-12 Students
EL-7010 Week 1 Assignment: Online Learning for the K-12 Students
 
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
 
jair.1.12918_3.pdf certified gando ho tum salo
jair.1.12918_3.pdf certified gando ho tum salojair.1.12918_3.pdf certified gando ho tum salo
jair.1.12918_3.pdf certified gando ho tum salo
 
Research Careers in Applied Computer Science
Research Careers in Applied Computer ScienceResearch Careers in Applied Computer Science
Research Careers in Applied Computer Science
 

Recently uploaded

Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..
SocioCosmos
 
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANEEASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
Febless Hernane
 
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
AJHSSR Journal
 
Surat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculumSurat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculum
digitalcourseshop4
 
SluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor ProposalSluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor Proposal
grogshiregames
 
HOW TO USE FACEBOOK _ by Clarissa Credito
HOW TO USE FACEBOOK _ by Clarissa CreditoHOW TO USE FACEBOOK _ by Clarissa Credito
HOW TO USE FACEBOOK _ by Clarissa Credito
ClarissaAlanoCredito
 
HOW TO USE THREADS an Instagram App_ by Clarissa Credito
HOW TO USE THREADS an Instagram App_ by Clarissa CreditoHOW TO USE THREADS an Instagram App_ by Clarissa Credito
HOW TO USE THREADS an Instagram App_ by Clarissa Credito
ClarissaAlanoCredito
 
Project Serenity — 33% Life-time Commissions.docx
Project Serenity — 33% Life-time Commissions.docxProject Serenity — 33% Life-time Commissions.docx
Project Serenity — 33% Life-time Commissions.docx
zeqirielmedina8
 
Your Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts HereYour Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts Here
SocioCosmos
 
Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........
SocioCosmos
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAMLORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
lorraineandreiamcidl
 
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
SocioCosmos
 
SluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final ProposalSluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final Proposal
grogshiregames
 
Social Media Marketing Strategies .
Social Media Marketing Strategies                     .Social Media Marketing Strategies                     .
Social Media Marketing Strategies .
Virtual Real Design
 
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
ryxqoswi
 
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyThe Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
Digital Marketing Lab
 
Your LinkedIn Success Starts Here.......
Your LinkedIn Success Starts Here.......Your LinkedIn Success Starts Here.......
Your LinkedIn Success Starts Here.......
SocioCosmos
 
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
AJHSSR Journal
 
Transform Your Presence Now!..............
Transform Your Presence Now!..............Transform Your Presence Now!..............
Transform Your Presence Now!..............
SocioCosmos
 

Recently uploaded (19)

Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..Unlock TikTok Success with Sociocosmos..
Unlock TikTok Success with Sociocosmos..
 
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANEEASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANE
 
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...
 
Surat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculumSurat Digital Marketing School - course curriculum
Surat Digital Marketing School - course curriculum
 
SluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor ProposalSluggerPunk Final Angel Investor Proposal
SluggerPunk Final Angel Investor Proposal
 
HOW TO USE FACEBOOK _ by Clarissa Credito
HOW TO USE FACEBOOK _ by Clarissa CreditoHOW TO USE FACEBOOK _ by Clarissa Credito
HOW TO USE FACEBOOK _ by Clarissa Credito
 
HOW TO USE THREADS an Instagram App_ by Clarissa Credito
HOW TO USE THREADS an Instagram App_ by Clarissa CreditoHOW TO USE THREADS an Instagram App_ by Clarissa Credito
HOW TO USE THREADS an Instagram App_ by Clarissa Credito
 
Project Serenity — 33% Life-time Commissions.docx
Project Serenity — 33% Life-time Commissions.docxProject Serenity — 33% Life-time Commissions.docx
Project Serenity — 33% Life-time Commissions.docx
 
Your Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts HereYour Path to YouTube Stardom Starts Here
Your Path to YouTube Stardom Starts Here
 
Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........Grow Your Reddit Community Fast.........
Grow Your Reddit Community Fast.........
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAMLORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE TELEGRAM
 
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...Buy Pinterest Followers, Reactions & Repins  Go Viral on Pinterest with Socio...
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...
 
SluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final ProposalSluggerPunk Angel Investor Final Proposal
SluggerPunk Angel Investor Final Proposal
 
Social Media Marketing Strategies .
Social Media Marketing Strategies                     .Social Media Marketing Strategies                     .
Social Media Marketing Strategies .
 
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
快速办理(BCR毕业证书)加州大学河滨分校毕业证文凭证书一模一样
 
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyThe Evolution of SEO: Insights from a Leading Digital Marketing Agency
The Evolution of SEO: Insights from a Leading Digital Marketing Agency
 
Your LinkedIn Success Starts Here.......
Your LinkedIn Success Starts Here.......Your LinkedIn Success Starts Here.......
Your LinkedIn Success Starts Here.......
 
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...
 
Transform Your Presence Now!..............
Transform Your Presence Now!..............Transform Your Presence Now!..............
Transform Your Presence Now!..............
 

Geographic and linguistic normalization opensym2014 poster

  • 1. Han-Teng Liao defended his PhD successfully at the Oxford Internet Institute (OII) July 2014. His research focus in is on user- generated content and data, Web analytics (webometrics), Chinese Internet Research and integrated digital research designs (both qualitative and quantitative). Thomas Petzold is a social technology analyst, TED speaker and professor of media management at HMKW – University of Applied Sciences for Media, Communication and Management in Berlin, Germany. As a research fellow at the WZB (2011–2013), he led a project on languages and big data in social technology. [photo: David Ausserhofer] Abstract What is Data Normalization? Finer normalization: geolinguistic unit A language tag: • Often starts with a language code followed by a country code • e.g. “fr‐CA” = the geolinguistic unit of French used in Canada.  • has corresponding data points in the Unicode’s Common Locale  Data Repository (CLDR) Project.  • e.g. “fr‐CA” = 7,605,004 [12] Finer geolinguistic data normalization is useful … • for finer comparison between, say, Egyptian Arabic and Saudi  Arabia Arabic speakers, or that of Spanish Spanish and Mexican  Spanish speakers • for analysts or designers to better know and thus support their  users by to providing appropriate interfaces and content[7] • for better understanding of the Wikipedia traffic data References (partial: those mentioned in this poster) [1] American Planning Association 2006. Planning and Urban Design Standards. John Wiley & Sons. [2] Cote, P. Effective Cartography: Mapping with Quantitative Data. Harvard Graduate School of Design. [3] Crowston, K. et al. 2013. Sustainability of Open Collaborative Communities: Analyzing Recruitment Efficiency. Technology Innovation Management Review. January: Open Source Sustainability (2013). Acknowledgments We appreciate the Wikimedia UK for the scholarship for Han‐Teng  Liao to present the findings at the OpenSym 2014. We also  acknowledge the open source software tools called Scrapy for  making the web mining tasks easier. Data normalization, or geographic normalization, allows data to be  compared using a sensible common denominator, thereby  producing measurements of intensity or density, such as  population density [1, 2] Data normalization is useful … • in “factoring out the size” in order to facilitate comparisons  across unequal areas or populations [2] • in dividing a certain numeric attribute (e.g. GDP)  by another (e.g. population), and so as to derive another numeric attribute (e.g. GDP per capita) • in minimizing the differences caused by the size of a geographic  unit It is similar to Crowston, Julien and Ortega[3] in “factoring out the  size” but different in the choice of size unit. • Crowston et al’s work[3] have proposed a measurement to  compare how efficient a language version turns potential users  into actual contributors. • They found “a strong (but not perfect) correlation” between  the total number of Wikipedia contributors on one side, and  the Internet population, and total tertiary‐educated population  on the other.  Han‐Teng Liao (hanteng@gmail.com) and Thomas Petzold (t.petzold@hmkw.de) Towards a better understanding of the geolinguistic dynamics of knowledge Geographic And Linguistic Normalization OpenSym '14 , Aug 27‐29 2014, Berlin, Germany ACM 978‐1‐4503‐3016‐9/14/08. http://dx.doi.org/10.1145/26415 80.2641623 We propose a method of geo‐linguistic normalization to advance  the existing comparative analysis of open collaborative  communities, with multilingual Wikipedia projects as the example.  Such normalization requires data regarding the potential users  and/or resources of a geolinguistic unit. 0% 20% 40% 60% 80% Percent of the traffic Year/Month pgViews_perLang Egypt Saudi Arabia Other Algeria 0 2 4 6 8 Normalized by language  population Year/Month pgViews_perLang Israel Kuwait Saudi Arabia UAE Jordan Bahrain Qatar Egypt Figure 1. Viewing traffic trend lines Figure 3. Normalized viewing traffic trend lines Comparing results: before and after data normalization Arabic Wikipedia viewing traffic Arabic Wikipedia editing traffic? Please refer to the extended abstract or ask the authors for more  (Figure 2 and Figure 4). English Wikipedia editing traffic English Wikipedia viewing traffic Figure EN1. Viewing traffic trend lines Figure EN3. Normalized viewing traffic trend lines 0% 10% 20% 30% 40% 50% Percent of the traffic Year/Month pgViews_perLang United States Other United Kingdom Canada 0 0.5 1 1.5 2 Normalized by language  population Year/Month pgViews_perLang Canada United Kingdom New Zealand Australia Ireland United States Malaysia Netherlands Spain Italy France Germany Figure EN2. Editing traffic trend lines Figure EN4. Normalized editing traffic trend lines 0% 10% 20% 30% 40% 50% Percent of the traffic Year/Month pgEdits_perLang United States United Kingdom Other Canada 0 0.5 1 1.5 2 2.5Normalized by language  population Year/Month pgEdits_perLang United Kingdom New Zealand Canada Ireland Australia [7] Liao, H.-T. 2013. How does localization influence online visibility of user-generated encyclopedias? A case study on Chinese- language Search Engine Result Pages (SERPs). Proceedings of the 9th International Symposium on Open Collaboration (Hong Kong, Aug. 2013). [12]Unicode Consortium 2014. Language- Territory Information, CLDR Version 25.