Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP). However, little research has been conducted on non-Google search engines and non-English versions of user-generated encyclopedias. This paper proposes a method to quantify the “display” gatekeeping differences of the SERP ranking and presents findings based on the Chinese SERP data. Based on 2,500 mainly-Chinese-language search queries, the data set includes the SERP outcome of four Chinese-speaking regions (mainland China, Singapore, Hong Kong and Taiwan) provided by three major search engines (Baidu, and Google and Yahoo), covering over 97% of the search engine market in each region. The findings, analysed and visualized using network analysis techniques, demonstrate the followings: major user-generated encyclopedias are among the most visible; localization factors matter (certain search engine variants produce the most divergent outcomes, especially mainland Chinese ones). The indicated strong effects of “network gatekeeping” by search engines also suggest similar dynamics inside user-generated encyclopedias.
Liao and petzold opensym berlin wikipedia geolinguistic normalizationHanteng Liao
This paper proposes a method of geo-linguistic normalization to advance the existing comparative analysis of open collaborative communities, with multilingual Wikipedia projects as the example. Such normalization requires data regarding the potential users and/or resources of a geolinguistic unit.
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...Hanteng Liao
ABSTRACT
This paper presents a case study of information engagement based on microblog posts gathered from Sina Weibo and Twitter that mentioned the two major Chinese-language user-generated encyclopaedias. The content analysis shows that microblog users not only engaged in public discussions by using and citing both encyclopaedias, but also shared their perceptions and experiences more generally with various online platforms and China’s filtering/censorship regime to which user-generated content and activities are subjected. This exploratory study thus raises several research and practice questions on the links between public discussions and information engagement on user-generated platforms.
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...Hanteng Liao
ABSTRACT
This paper presents a webometric analysis of the academic search
engine result pages (SERPs) of the Chinese-language term of
“Wikipedia” across major Chinese-speaking regions of mainland
China, Hong Kong and Taiwan. Because of the academic
outcome, the findings can also be interpreted for further metaanalysis,
or “research about research”, of the Wikipedia research
in Chinese-language literatures. The findings cover the results
from four major search platforms: CNKI Scholar, Google Scholar
China, Google Scholar Hong Kong and Google Scholar Taiwan.
Cross tabulation of the results shows the major institutions
(journals and academic departments) and scholarly archives for
Chinese-language Wikipedia research. The findings suggest that
there exists a divide between mainland Chinese academic
sources/search results on one hand, and Hong Kong/Taiwanese
ones on the other. Meta-analysis based on academic SERPs have
implications for identifying the gaps and potentials in
internationalization of Wikipedia research.
Liao and petzold opensym berlin wikipedia geolinguistic normalizationHanteng Liao
This paper proposes a method of geo-linguistic normalization to advance the existing comparative analysis of open collaborative communities, with multilingual Wikipedia projects as the example. Such normalization requires data regarding the potential users and/or resources of a geolinguistic unit.
What do Chinese-language microblog users do with Baidu Baike and Chinese Wiki...Hanteng Liao
ABSTRACT
This paper presents a case study of information engagement based on microblog posts gathered from Sina Weibo and Twitter that mentioned the two major Chinese-language user-generated encyclopaedias. The content analysis shows that microblog users not only engaged in public discussions by using and citing both encyclopaedias, but also shared their perceptions and experiences more generally with various online platforms and China’s filtering/censorship regime to which user-generated content and activities are subjected. This exploratory study thus raises several research and practice questions on the links between public discussions and information engagement on user-generated platforms.
Chinese-language literature about Wikipedia: a metaanalysis of academic searc...Hanteng Liao
ABSTRACT
This paper presents a webometric analysis of the academic search
engine result pages (SERPs) of the Chinese-language term of
“Wikipedia” across major Chinese-speaking regions of mainland
China, Hong Kong and Taiwan. Because of the academic
outcome, the findings can also be interpreted for further metaanalysis,
or “research about research”, of the Wikipedia research
in Chinese-language literatures. The findings cover the results
from four major search platforms: CNKI Scholar, Google Scholar
China, Google Scholar Hong Kong and Google Scholar Taiwan.
Cross tabulation of the results shows the major institutions
(journals and academic departments) and scholarly archives for
Chinese-language Wikipedia research. The findings suggest that
there exists a divide between mainland Chinese academic
sources/search results on one hand, and Hong Kong/Taiwanese
ones on the other. Meta-analysis based on academic SERPs have
implications for identifying the gaps and potentials in
internationalization of Wikipedia research.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강Han Woo PARK
Caroline S. Wagner 박사
미국 오하이오주립대 교수 (현),
국제저명학술지 Science and Public Pollicy 편집위원장 (현)
미국 펜실베니아주립대 교수 (전), 미국랜드연구소 연구원(전)
영남대 제2인문관 201호(문파실), 2015. 10. 23. 금. 오후 3시~5시
주최: BK21+ 글로컬동아시아문화콘텐츠사업단/영남대 사이버감성연구소
문의: 영남대 동아시아문화학과 학과사무실(053-810-4505)
The effects of Facebook use on civic participation attitudes and behaviour: A...Mark Dix
A paper written in 2011.
The following proposal suggests a network analysis approach to study the effects of web communication on civic participation. A three-phase mixed methods research design is proposed to examine firstly, the effect of supplementary communication via the social networking site Facebook, on the structure (quantity) and content (quality) of social ties within a network of citizens engaged in health and social care policymaking. It is proposed that the network variables of tie structure and content are then tested in an affective capacity against the participatory attitudes and behaviour of networked individuals. By reframing the study of web use and civic participation under a network theoretical framework, the proposed study will add to the existing literature in the field through recognition of the mediative capacity of relational ties in the formation of participatory capital. It is suggested that it is through their effect on relational tie structure and content within citizen participation networks, that social networking sites such as Facebook affect participatory attitudes and behaviour. To set a critical context for the proposed study, a final qualitative phase of research is suggested to examine the professional power structures impacting upon participant expressions of agency.
한국언론학회 2016년 봄철학술대회의 <테마논문> 세션
이번 학술대회의 테마는 <미래>이며,
이 세션에서는 테마에 관한 초청논문이 발표되고 토론될 예정입니다.
학술대회의 여러 행사 중 가장 중요한 세션이라고 할 수 있지요~^^
4부(15:50~17:30)에 100분간 진행되며, 장소는 이화여대 ECC B225호입니다(날짜: 5월 21일 토).
100분동안 3편의 논문이 발표되며, 각 논문 당 한 분이 토론에 참여하십니다.
Mining and Analyzing Academic Social NetworksEditor IJCATR
Academics establish relationships by way of various interactions like jointly authoring a research paper or report, jointly
supervising a thesis, working jointly on a project, etc. Some of these relationships are ubiquitous whereas other are hard to keep track
of. Of all types of possible academic and research collaborations, co-authorship is best documented. In this paper we analyze the coauthorship
based academic social networks of computer science engineering departments of Indian Institutes of Technology (IITs) as
evidenced from their research publications produced during 2011 and 2015. We use social network analysis metrics to study the
collaboration networks in four leading IITs. From experimental results it can be concluded that IIT Delhi and IIT Kharagpur have a
close knit collaboration network whereas the collaboration network of IIT Kanpur and IIT Madras is fragmented. However, the
collaboration networks of all the four IITs exhibit similar network properties as expected from any other collaboration network
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
With the increasing growth of Internet and World Wide Web, information retrieval (IR) has
attracted much attention in recent years. Quick, accurate and quality information mining is the
core concern of successful search companies. Likewise, spammers try to manipulate IR system
to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the
spamming techniques of adversarial IR, allowing users to exploit ranking of specific documents
in search engine result page (SERP). Spammers take advantage of different features of web
indexing system for notorious motives. Suitable machine learning approaches can be useful in
analysis of spam patterns and automated detection of spam. This paper examines content based
features of web documents and discusses the potential of feature selection (FS) in upcoming
studies to combat web spam. The objective of feature selection is to select the salient features to
improve prediction performance and to understand the underlying data generation techniques.
A publically available web data set namely WEBSPAM - UK2007 is used for all evaluations.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
A través de la historia varias marcas de automóviles han roto records mundiales, han creado piezas automotrices que se volvieron inmortales aportando piezas a la historia de sus marcas y aportando a la vida de sus consumidores.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강Han Woo PARK
Caroline S. Wagner 박사
미국 오하이오주립대 교수 (현),
국제저명학술지 Science and Public Pollicy 편집위원장 (현)
미국 펜실베니아주립대 교수 (전), 미국랜드연구소 연구원(전)
영남대 제2인문관 201호(문파실), 2015. 10. 23. 금. 오후 3시~5시
주최: BK21+ 글로컬동아시아문화콘텐츠사업단/영남대 사이버감성연구소
문의: 영남대 동아시아문화학과 학과사무실(053-810-4505)
The effects of Facebook use on civic participation attitudes and behaviour: A...Mark Dix
A paper written in 2011.
The following proposal suggests a network analysis approach to study the effects of web communication on civic participation. A three-phase mixed methods research design is proposed to examine firstly, the effect of supplementary communication via the social networking site Facebook, on the structure (quantity) and content (quality) of social ties within a network of citizens engaged in health and social care policymaking. It is proposed that the network variables of tie structure and content are then tested in an affective capacity against the participatory attitudes and behaviour of networked individuals. By reframing the study of web use and civic participation under a network theoretical framework, the proposed study will add to the existing literature in the field through recognition of the mediative capacity of relational ties in the formation of participatory capital. It is suggested that it is through their effect on relational tie structure and content within citizen participation networks, that social networking sites such as Facebook affect participatory attitudes and behaviour. To set a critical context for the proposed study, a final qualitative phase of research is suggested to examine the professional power structures impacting upon participant expressions of agency.
한국언론학회 2016년 봄철학술대회의 <테마논문> 세션
이번 학술대회의 테마는 <미래>이며,
이 세션에서는 테마에 관한 초청논문이 발표되고 토론될 예정입니다.
학술대회의 여러 행사 중 가장 중요한 세션이라고 할 수 있지요~^^
4부(15:50~17:30)에 100분간 진행되며, 장소는 이화여대 ECC B225호입니다(날짜: 5월 21일 토).
100분동안 3편의 논문이 발표되며, 각 논문 당 한 분이 토론에 참여하십니다.
Mining and Analyzing Academic Social NetworksEditor IJCATR
Academics establish relationships by way of various interactions like jointly authoring a research paper or report, jointly
supervising a thesis, working jointly on a project, etc. Some of these relationships are ubiquitous whereas other are hard to keep track
of. Of all types of possible academic and research collaborations, co-authorship is best documented. In this paper we analyze the coauthorship
based academic social networks of computer science engineering departments of Indian Institutes of Technology (IITs) as
evidenced from their research publications produced during 2011 and 2015. We use social network analysis metrics to study the
collaboration networks in four leading IITs. From experimental results it can be concluded that IIT Delhi and IIT Kharagpur have a
close knit collaboration network whereas the collaboration network of IIT Kanpur and IIT Madras is fragmented. However, the
collaboration networks of all the four IITs exhibit similar network properties as expected from any other collaboration network
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
With the increasing growth of Internet and World Wide Web, information retrieval (IR) has
attracted much attention in recent years. Quick, accurate and quality information mining is the
core concern of successful search companies. Likewise, spammers try to manipulate IR system
to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the
spamming techniques of adversarial IR, allowing users to exploit ranking of specific documents
in search engine result page (SERP). Spammers take advantage of different features of web
indexing system for notorious motives. Suitable machine learning approaches can be useful in
analysis of spam patterns and automated detection of spam. This paper examines content based
features of web documents and discusses the potential of feature selection (FS) in upcoming
studies to combat web spam. The objective of feature selection is to select the salient features to
improve prediction performance and to understand the underlying data generation techniques.
A publically available web data set namely WEBSPAM - UK2007 is used for all evaluations.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
A través de la historia varias marcas de automóviles han roto records mundiales, han creado piezas automotrices que se volvieron inmortales aportando piezas a la historia de sus marcas y aportando a la vida de sus consumidores.
Opal is one of the most sought after gemstones and probably the most attractive of all available. If you want heads turning for you, opal jewellery is an obvious choice because the colors change with each movement. Visit opalmine.com for details.
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP ijnlc
Information retrieval is becoming an intricate part of every domain. Be it in acquiring data from various sources to form a single unit or to present the data in such a way that anyone can extract useful information and hence used in data analysis, data mining etc. This arena has gained much importance in the recent years because as of today we are exploded with various kind of information from the real-world. The growing importance of research data and retrieving the intelligent data are the main focus for any business today. So coming years this is a field where major work need to be done. We have focused here to implement a system for information retrieval from the webpages using Natural Language Processing (NLP) and have shown to getting better results than the existing system. Webpages is a home to huge amount of information from various entities in the real-world. Here we have designed a system for information retrieval technique for web using NLP where techniques Hierarchical Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-CRF) along with Visual Page Segmentation is used to get the accurate results. Also parallel processing is used to achieve the results in desired time frame. It further improves the decision making between HCRF and Semi-CRF by using bidirectional approach rather than top-down approach. It enables better understanding of the content and page structure.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract: As profound web developer at quick pace, there has been expanded enthusiasm for method that assists proficiently with finding profound web interfaces. Nonetheless, because of the extensive volume of web assets and the dynamic way of profound web, accomplishing wide scope and high productivity is a testing issue. We propose a two-stage structure, in particular Smart Crawler, for effective gathering profound web interfaces. In the first stage, Smart Crawler performs site-based hunting down focus pages with the assistance of web indexes, abstaining from going by countless. To accomplish more exact results for an engaged slither, Smart Crawler positions sites to organize profoundly pertinent ones for a given point. In the second stage, Smart Crawler accomplishes quick in-site excavating so as to see most significant connections with a versatile connection positioning. To dispense with inclination on going by some exceedingly significant connections in shrouded web indexes, we outline a connection tree information structure to accomplish more extensive scope for a site. Our test results on an arrangement of delegate areas demonstrate the readiness and precision of our proposed crawler structure, which effectively recovers profound web interfaces from huge scale destinations and accomplishes higher harvest rates than different crawlers.
Linked Data Generation for the University Data From Legacy Database dannyijwest
Web was developed to share information among the users through internet as some hyperlinked documents.
If someone wants to collect some data from the web he has to search and crawl through the documents to
fulfil his needs. Concept of Linked Data creates a breakthrough at this stage by enabling the links within
data. So, besides the web of connected documents a new web developed both for humans and machines, i.e.,
the web of connected data, simply known as Linked Data Web. Since it is a very new domain, still a very
few works has been done, specially the publication of legacy data within a University domain as Linked
Data.
Taxonomy, Social Networks and Pace LayeringRoger Hudson
Roger Hudson discusses the roles of search, taxonomy and social networks in information classification and retrieval. Can pace-layering help us find the best approach.
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONINGijaia
The Web, as communication channel, has had variety of development that allows information to be published and accessed in a scaleable approach. With the revolution of the information, some research studies have conducted to boost the present situation and propose advance version of the Web. Therefore, it is important to look into the new version of the Web in order to improve the way that information is expressed, to make more intelligent choices and to obtain a better meaning of the information over the Web. That is, future web would require specific architecture in order to support the extracting of better
meaning or "reasoning". With Web 1.0 and Web 2.0, the current information over the Web is not understandable for the machines. Understanding is big shift for wide open door for innovatoion and reasoning. In this work, we research the progress of the Web from Web 1.0, Web 2.0, Web 3.0, Web 4.0, to Web 5.0. We are pointing out document types and technologies employed to understand the changes from
Web 1.0 to Web 3.0 and to predicate the future of the Web (Web 4.0 and Web 5.0). Also, we present the current status and concerns about the Web as an information source and communication channel.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Elevating Tactical DDD Patterns Through Object Calisthenics
[Wikisym2013] serp revised_apa_notice
1. Only the abstract here is included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The full text is a work-in-
progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research.
How does localization influence online visibility of user-
generated encyclopedias? A study on Chinese-language
Search Engine Result Pages (SERPs)
Han-Teng Liao
Oxford Internet Institute
University of Oxford
Oxford, United Kingdom
hanteng@gmail.com
ABSTRACT
Prior empirical and theoretical work has discussed the role of
dominant search engine plays in the function of information
gatekeeping on the Web, and there are reports on the high ranking
of Wikipedia website among the search engine result pages (SERP).
However, little research has been conducted on non-Google search
engines and non-English versions of user-generated encyclopedias.
This paper proposes a method to quantify the “display” gatekeeping
differences of the SERP ranking and presents findings based on the
Chinese SERP data. Based on 2,500 mainly-Chinese-language
search queries, the data set includes the SERP outcome of four
Chinese-speaking regions (mainland China, Singapore, Hong Kong
and Taiwan) provided by three major search engines (Baidu, and
Google and Yahoo), covering over 97% of the search engine
market in each region. The findings, analysed and visualized using
network analysis techniques, demonstrate the followings: major
user-generated encyclopedias are among the most visible;
localization factors matter (certain search engine variants produce
the most divergent outcomes, especially mainland Chinese ones).
The indicated strong effects of “network gatekeeping” by search
engines also suggest similar dynamics inside user-generated
encyclopedias.
Categories and Subject Descriptors
[Human-centered computing]: Collaborative and social
computing – Collaborative filtering, Wikis, Empirical studies in
collaborative and social computing
[Information Systems]: Web search engines – Collaborative
filtering, Page and site ranking
General Terms
Management, Performance, Design, Human Factors, Theory
Keywords
Geo-linguistic analysis, network analysis, Network gatekeeping,
Chinese Internet, Chinese characters, Localization, censorship.
1. INTRODUCTION
Using search engine is among the most popular online activity for
users in the US (Fallows, 2008) and mainland China (CIC, 2009;
CNNIC, 2009), and has been among the driving forces of the fast-
growing online advertising platform (Varian, 2007; SEMPO, 2011;
IDATE, 2011; PricewaterhouseCoopers, 2011). It has been
reported that (and speculated why) the global leader of search
engines Google has consistently favoured the global leader of user-
generated encyclopedias Wikipedia by showing relevant pages
frequently and prominently in the search engine result pages
(thereafter SERP) (Charlton, 2012; Čuhalev, 2006; Gray, 2007;
Jones, 2007; Silverwood-Cope, 2012). Independent market
research by Nielsen Online and Hitwise Intelligence has
demonstrated that Wikipedia not only dominates the online visits
for encyclopedia content, but also does so mainly because of the
traffic directed by major Web search engines (Hopkins, 2009;
Nielsen Online, 2008). Even the Wikimedia Foundation
acknowledged this (Google drives traffic to Wikipedia), but
nonetheless argued that half of its readers did want to look for
Wikipedia content (Khanna, 2011). Thus, as major websites that
dominate traffic and user attention, Google and Wikipedia seem to
be central in guiding users where to look.
However, most of the findings and discussions are limited to or
predominantly focused on the English-language context(Battelle,
2005; Bermejo, 2009; Couvering, 2004, 2008; Dahlberg, 2005;
Hargittai, 2007; Segev, 2008), and little effort has been made to
understand whether such a phenomenon is specific to
Google/Wikipedia or can be found for other major search engines
and user-generated encyclopedias. In addition, the multi-lingual
internet and the rise of non-English users on the Web have multiple
implications on the “localization” effects on search engines.
Localization (thereafter L10n), a process of adapting computer
software or information systems for a group of users usually
defined by national boundaries or geo-linguistic profiles(Hussain
& Mohan, 2008; Liao, 2011; McKenna & Naftulin, 2000), is
expected to influence users’ information-seeking practices. Both
Google and Wikipedia provide localized content and interfaces
designed to serve different group of users. .
Because Google (or other general-purpose search engines),
Wikipedia (or other user-generated encyclopedias) and localization
are likely to present and thus frame the Web differently for different
groups of users, they effectively filter information for them. While
such filtering can be described as gatekeeping by communication
scholars, the fact that the Web users can directly or indirectly
participate in such information filtering processes has introduced
techniques and theories of "collaborative filtering" (Benkler, 2006;
Goldberg, Nichols, Oki, & Terry, 1992) and “network
gatekeeping”(Barzilai-Nahon, 2008). Indeed, while Google and
Only a prior version of the abstract above was included in the
proceedings of the WikiSym + OpenSym 2013 Conference
(wsos2013). The text below is a work-in-progress draft, revised
based on blind-review comments and suggestions. Please contact
the author for latest citation for this research.
WikiSym '13 August 05 - 07 2013, Hong Kong, China
Copyright 2013 ACM 978-1-4503-1852-5/13/08 ...$15.00.
2. 2
Wikipedia may concentrate Web traffic and command user
attention as major global websites, users’ contribution of web
content and links may also influence such filtering and gatekeeping
outcomes, as demonstrated by the case of Google query of
“Jew”(Bar Ilan, 2006) : some users were organized to help the
Wikipedia’s entry page of “Jew” to rank higher in the Google’s
English-language SERPs.
Thus, although both "collaborative filtering" (Benkler, 2006;
Goldberg et al., 1992) and “network gatekeeping”(Barzilai-Nahon,
2008) are indeed about filtering and keeping information, the
possibility of participation by user input makes the different from
the filtering and gatekeeping processes in traditional media.
Nonetheless, I argue that geographic and linguistic factors may
bound or limit such collaborative and networking possibilities and
thus re-introducing national and/or linguistic boundaries back on
the Web. Indeed, as early as in the early 2000s, researchers such as
Zittrain and Sunstein have raised the issues of localized search
results in filtering political content or fragmenting public sphere
(Morris & Ogan, 2002; Sunstein, 2002). For SERPs, the question
of information control and linguistic boundaries remains, while the
“borders” of national framework have been reintroduced in many
aspects of technological and legal arrangements(University &
School, 2006). In particular, Google’s first collaboration with (or
accommodation of) Chinese government’s need and later exit from
mainland China has demonstrated the intricate political and cultural
dimensions of “localization” of search engine services(Vaughan &
Zhang, 2007; Einhorn, 2010). Thus, the research gap on the effects
of localization on SERPs and non-English Wikipedia need to be
filled, including prominent cases of Chinese-language and Arabic-
language internet users whose recent presence and participation in
the new internet world has also attracted much attention (Dutta,
Dutton, & Law, 2011). In particular, in order to answer how search
engines and/or user-generated encyclopedias reintroduce or shape
the national or social boundaries, more empirical work on L10n
effects is needed (Aragón, Kaltenbrunner, Laniado, & Volkovich,
2012; Bao et al., 2012; Hecht & Gergle, 2010; Liao, 2008, 2011;
Luyt, Goh, & Lee, 2009; Massa & Scrinzi, 2012; Mazieres &
Huron, 2013; Petzold, Liao, Hartley, & Potts, 2012; Rogers &
Sendijarevic, 2012; Warncke-Wang, Uduwage, Dong, & Riedl,
2012). L10n is also briefly discussed as contributing factor to
“internationalization mechanisms” of “network
gatekeeping”(Barzilai-Nahon, 2008), holding the key for
researchers to understand the nationalization or internationalization
dynamics of the Web.
For Chinese-language internet, there are many localized versions
provided several major search engines, including examples such as
Yahoo China, Google Hong Kong, Google Taiwan, etc. I call them
search engine-locale variants (thereafter search engine variants).
Do different search engine variants guide users from various
Chinese-speaking regions to see the same websites regardless of
which search engine they chose? Or do they see divergent SERP?
Prior empirical research has been conducted in analysing SERPs
inside mainland China, with the latest research on 316 search query
phrases of “Internet event” collected in 2009, indicating that indeed
Baidu Baike and Chinese Wikipedia has ranked high among the
SERPs (Jiang & Akhtar, 2011). However, it focuses on (and thus is
limited to) simplified Chinese users in mainland China and the
selected sample of search queries was based upon internet incidents
that are politically controversial to mainland China. This paper
contributes findings based 2500 search queries in 2011, covering
not only more topics but also more Chinese-language search
engines across more regions such as Hong Kong, Taiwan and
Singapore. Before presenting the methods and findings, the next
section will first provide a theoretical framework that captures the
localization effects of search engines.
2. L10N OF SEARCH ENGINES
Observing how search engines categorise users is one of the
practical ways to examine the impact of search engines on national
and/or regional boundaries. As part of the industry practice in
internationalization/Localization (i18n/L10n), search engines
provide different interfaces and services for different users, usually
categorized by their geo-linguistic identifiers, using language codes
such as zh-TW (Chinese in Taiwan), pt-BR (Portuguese in Brazil),
and en-IN (English in India)(DePalma, 2002; Dunne, 2006). These
identifiers in turn influence how content is aggregated, filtered and
prioritised for users who share the same or similar language
preferences. Online users and audiences are often partitioned
accordingly by search engine marketing tools such as Google
AdWords and Microsoft adCenter. Unlike the globalized TV
industry where broadcasting and cable TV are still bounded to
geography, these geo-linguistic codes are configurable. For
example, one can manage to use UK version of Google even when
not in UK
To conceptualize the localization effects of search engines, this
paper applies the “network gatekeeping” theory (Barzilai-Nahon,
2008) for the following reasons. First, localization was discussed
as contributing factor to “internationalization mechanisms” of
“network gatekeeping”(Barzilai-Nahon, 2008). Albeit the theory
comes mainly from information science to better understand
information control in network settings, its multidisciplinary
aspects (Jucquois-Delpierre, 2007) can help researchers understand
how seemingly technical arrangement of computer software or
information system can have enormous effects on gatekeeping or
controlling the flows and presentation of information. Second,
distinct from traditional gatekeeping theory that focuses on
withholding or deletion of information, the network gatekeeping
theory not only conceptualizes localization as part of the
gatekeeping processes, but also emphasizes the “display” bases for
such processes: “Presenting information in a particular visual form
designed to catch the eye” (Barzilai-Nahon, 2008). Indeed, search
engines visually present the results. Thus, to understand the
localization effects of search engines, a data collection method
must consider not only the localization parameters but also the
visual display of search results.
I argue that locales in computing, a set of parameters that describes
user’s language, region and other interface preferences, constitute
one of the most important online “situations” for online media. By
“situations” I use the definition used by medium theorists in the
tradition of media ecology: “situations as (social) information-
systems that set the patterns of access to information” (Meyrowitz,
1986, 1994). Note that as medium theorists focus on medium rather
on messages, the definition is particular suitable for studying search
engines because some major companies including Google have
resisted the idea that they are in the content or media industry by
insisting that they are information companies. For media and
communication scholars, the underlying question is less about
Google’s industrial identity but rather about how online media in
general can use locales to segment, fragment and integrate different
media markets and/or audiences by using different information
system settings. Thus, geographic and linguistic factors seem to
“set the patterns of access to information”, as geo-linguistic
situations are expected to determine which websites will be the
most visible and constantly appearing ones in the SERPs.
3. 3
2.1 A Straight-forward Visibility Test
Because users often browse SERPs from the top to the bottom,
various market research(Enquiro, 2007), social science research
(Bar Ilan, 2006; Dunleavy, Margetts, Bastow, Pearce, & Tinkler,
2007; Margetts & Escher, 2006; Vaughan & Thelwall, 2004) and
industry practices (Slingshot SEO, 2011) has measured the level of
online visibility based on webometric data such as the positions in
SERPs (more visible if more high up) and/or the number of in-
coming web links by other websites. These measurements provide
the foundations for keyword search advertising (Brettel & Spilker-
Attig, 2010; Chen, 2008; B. J. Jansen, Brown, & Resnick, 2007; B.
J. Jansen & Mullen, 2008; J. Jansen, 2011; Jung, 2008; Malaga,
2008; Spindler, 2010). For marketing purposes, it is imperative to
boost the ranking of a website for a target set of search terms (or
search keywords). For the purpose of this research, the focus shifts
to the medium role of search engines between users and webpages.
As shown in Figure 1, search engines play the gatekeeping role by
curating different sets of web pages for different group of users
characterized by their respective search engine variants. It
functions as “network” gatekeeping because search engines often
provide different rankings based on both user data and the inter-
linking data among the web pages themselves.
Figure 1. Search engines as the “network gatekeeper”
between users and web pages
To account for the difference made by the ranking positions in
SERPs, this research proposes a method to quantify such “display”
gatekeeping differences(Barzilai-Nahon, 2008). Because different
SERP rankings suggest different level of visibility, different scores
can be assigned. One way to do so is use click-through rate
(thereafter CTR) data for SERPs.
Commonly used in online advertising, CTR measure the number of
clicks on a web link divided by the number of times it is shown to
the users (i.e. clicks/impressions). For search engine marketing,
CTR indicate the probability of a listed web link being clicked.
Based on the arithmetic mean of the CTR for top-10 search results
from five different sources (Hearne, 2006; Jones, 2007; Young,
2011), I plotted the scatter chart in Figure 2 to show the relationship
between the SERP ranking and CTR. The top-ranking website is
expected to receive more than 30% of the traffic while the second
receives just a bit over 10%, and so on. The relationship between
the SERP ranking and CTR seem to follow the power function of y
= axb
. Thus a power regression analysis is done to provide a
curve-fitting function of y = 0.2889x-1.078
, with high R² value
(0.9934), suggesting a close fit. Thus for this research, the visibility
scores are assigned accordingly based on the SERP ranking.
Figure 2. Click-through Rates depending on the ranking in
the Search Engine Results Page (SERP)
While it is impossible to exhaust the SERPs to identify patterns of
preferred websites, it has been established by the previous research
that the top-10 search results in the first SERP occupy a significant
proportion of users’ attention and actual clicks (Hearne, 2006;
Jones, 2007; Young, 2011), and based on such estimated data of
CTR, different visibility scores can be assigned to websites
depending on their ranking in the SERP, as shown in Figure 2.
High SERP ranking does not always guarantee users’ actual clicks.
Nonetheless, it is justified to use CTR as proxy for visibility scores
for the purpose of research: it is the best-effort attempt based on
various sources of industry data.
2.2 Chinese Search Engine Markets
According to various survey, market and traffic reports from both
inside and outside mainland China (CIC, 2009; CNNIC, 2006,
2007; Nguyen, 2011; Russell, 2011; StatCounter, 2011), three
major search engines (Baidu, and Google and Yahoo) dominate the
search engine markets across four regions (mainland China,
Singapore, Hong Kong, and Taiwan) and two Chinese scripts
preferences (simplified Chinese for mainland China and Singapore;
traditional Chinese for Hong Kong and Taiwan). Thus, nine search
engine variants can be derived from the combinations of search
engine providers and geo-linguistic preferences, which altogether
cover over 97% of the market::
For mainland China (mostly simplified Chinese users):
zh-cn: Baidu, Google (simplified Chinese), Yahoo China
For Singapore (mostly simplified Chinese users):
zh-sg:Google Singapore and Yahoo Singapore
For Hong Kong (mostly traditional Chinese users):
zh-hk:Google Hong Kong and Yahoo Hong Kong
For Taiwan (mostly traditional Chinese users):
zh-tw:Google Taiwan and Yahoo Taiwan
These variants are hereafter abbreviated as Baidu_CN, Google_CN,
Yahoo_CN, Google_SG, Yahoo_SG, Google_HK, Yahoo_HK,
Google_TW,Yahoo_TW.It is noted that Baidu continues to enjoy its
lead in mainland China with Google at second place, after Google
moved its mainland operations to Hong Kong (BBC, 2011). In
Hong Kong and Taiwan around 2010 to 2011, Google has
overtaken Yahoo’s leading position while maintaining its top
position in Singapore (StatCounter, 2011). With all these nine
variants, will the SERPs merge on a similar set of websites or
diverge? By answering this question, researchers can gain insights
on the converging and diverging effects of search engines for
Chinese-language users across these regions.
Users (often
categorized by
providers and
geo‐linguistic
settings)
Search Engines
Web
pages
y = 0.2889x-1.078
R² = 0.9934
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 5 6 7 8 9 10
VisibilityScores
Ranking of the Search Engine Results Page
wighted by
CTR
unweighted
Power
(wighted by
CTR)
4. 4
2.3 Merging and diverging effects of SERPs
If the aforementioned market survey and traffic reports are correct,
search engine users from Taiwan mostly filter web pages through
the lens of search engine variants of Google_TW and Yahoo_TW.
ThosefromHongKongmostlyuseGoogle_HKandYahoo_HK,andsoon.
By conceptualizing search engines as medium, the merging and
diverging patterns of SERPs will also indicate whether users from
these regions will see similar websites, using different search
engine providers. Hence, the SERP data may indicate patterns
which search engines may overcome offline boundaries across
these regions (if the SERPs converge on specific websites) and
which may reinforce them (if the SERPs diverge), thereby
contributing to the general question of media and globalization on
the case of search engines.
To do so, the proposed method of visibility tests that quantify the
top-ranking websites can be used as indication of search engines
exercising its “display” gatekeeping power for certain websites.
Based on the quantified numbers of such display gatekeeping
power, the visibility patterns can be systematically examined
between (1) search engine variants and (2) visible websites.
Moreover, visibility scores can be further aggregated (i.e. summed)
over a selection of search queries, so as to better answer different
research questions that guide such selection. Ideally, by exhausting
visibility scores for various localized versions of SERPs over large
sample of search queries, researchers can better compare how
visible a website is across different search engine variants, thereby
paving the ways for showing the merging and diverging patterns of
the SERPs.
It should be noted that, borrowing from the academic research on
webometric visibility and the industry practice on keyword
advertising, the proposed framework and method is general enough
for future study regardless the providers and/or geo-linguistic
preferences of search engines: For example. How different, or
similar, are the SERPs provided by Yandex versus Google in
Turkey? How different, or similar, are the SERPs provided by
Google Hindi versus Google Urdu in India? The outcome of
visibility scores can be further visualized and analysed by various
network analysis techniques. Thus, this method will answer these
empirical questions, with results that can then be interpreted to
explore the cultural political implications of such patterns.
To showcase how the integrated method works satisfactorily, I
choose to study Chinese-language internet because its boundaries
have several historical, cultural and political complications. For
example, regions such as mainland China, Singapore, Hong Kong
and Taiwan have different practices in democracy, free speech,
human rights and Chinese scripts (Damm, 2007; Liao, 2009; Zhao
& Baldauf, 2007).
3. DATA Collection
To identify how search engine variants influence the Chinese-
language SERPs, the top-10 results should provide enough
indication.
3.1 Search Queries
First, I have selected about 2500 search queries that are relevant to
Chinese cultural and political topics. As summarized in Table 1, the
selection includes all 990 entries in "The Cambridge Encyclopedia
of China"(The Cambridge encyclopedia of China, 1991), the top 10
search terms provided respectively by Baidu and Google (including
mainland China, Hong Kong and Taiwan variations) of various
categories since 2007, major popular cultural references, notable
people names and some other culturally and politically "sensitive"
keywords. Although other selection or combination is possible, this
selection aims to focus this research on the prominence of user-
generated encyclopedias across Chinese-speaking regions.
Table 1 Sources and numbers of search queries
Second, the sample keywords are transliterated into search queries
according to the respective Chinese orthographic preferences
(simplified Chinese for mainland China and Singapore; traditional
Chinese for Hong Kong and Taiwan), making this research first of
its kind to compare SERPs across Chinese-language variants.
Third, the top-10 SERPs are collected for the nine search engine
variants that cover four major Chinese-speaking regions of China,
Singapore, Hong Kong and Taiwan. Then they are parsed and
processed by the visibility tests, weighting the high-ranking
website with higher visibility scores.
3.2 Search Results
Around 22,000 web links are extracted from the SERPs based on
the outcome of 2500 search queries submitted across nine
variations of search engines in 2011. These 22,000 web links
correspond to around 25,000 unique domain names. Then the
outcome is further consolidated manually by checking IP addresses
to over 16,000 websites (e.g. the website of sohu.com aggregates
money.sohu.com and women.sohu.com). Finally, all education and
government websites are aggregated into respective top-level
domain names, such as edu.tw, edu.cn, gov.cn and gov.hk.
4. FINDINGS
To show how localization influences online visibility, the collected
data of visibility scores are unpacked and analysed as follows.
4.1 Concentrated visibility scores
Figure 3 shows the respective proportion distribution and
accumulative distribution of visibility scores for the top-100 most
visible websites. It is evident that near 80% of the visibility scores
are concentrated over the top-100 websites, and indeed three user-
generated encyclopedia websites ranked highest: (1)wikipedia.org,
(2)baidu.com and (3)hudong.com. For the website wikipedia.org,
Chinese Wikipedia (zh.wikipedia.org) is the most visible; for
Baidu.com, Baidu Baike (baike.baidu.org) is the most visible.
Categories of Search Keywords
The Cambridge Encyclopedia of China 990
Top 10 Search Terms (Google and Baidu) 387
Best Film/Popular Music (China, Hong Kong, Taiwan) 364
Modern Concepts (shared with modern Japanese) 171
Notable People 476
Nobel Prize Winners of Chinese origin 11
Major Chinese Politicians 187
Rich People (China, Hong Kong, Taiwan) 82
100 Contemporary Intellectuals (China) 100
Major Fugitives From Taiwan 17
Victims of White Terror in Taiwan 79
Potentially Sensitive Terms 112
Japanese AV porn stars 48
Prosecuted and Sentenced Corrupted Chinese Officials 14
Documented Filtered Words by Great Firewall 50
Total 2500
Numbers
5. 5
Figure 3. Concentrated visibility scores
Since the top-100 most visible websites account for more than 80%
of the visibility scores, strong concentration effects are found. Thus,
the following sub-section further examines these websites.
4.2 Tabulating visibility scores
Table 2 tabulates the top-100 ranking websites, and their respective
visibility scores for each search engine variants. Each cell shows
the visibility score that a search engine variant has contributed to a
particular website. For example, the first cell 34.30 indicates how
much Baidu_CN has contributed to Chinese Wikipedia
(zh.wikipedia.org).
Table 2 Top-ranking websites: visibility scores
Note that the top three are all user-generated encyclopedia: Chinese
Wikipedia, Baidu Baike and Hudong Baike. For another example,
the official news website of Falun Gong (epochtimes.com which is
ranked at 18th) is completely blocked out from Baidu’s results (i.e.
the zero visibility score suggests that it never show up in Baidu’s
SERPs). It is in direct contrast, say for Yahoo_HK in third last
column, where it enjoys visibility score higher than all other
mainland-based website including Chinese official media People’s
Daily (people.com.cn which is ranked at 15th), suggesting that the
Falun Gong news website perform better even than People’s Daily
for Yahoo Hong Kong.
Therefore, Table 2 shows in detail which search engine variants
favour which websites by citing and showing them more often and
prominently in SERPs, rendering them easier to be found (at least
for the selection of the search queries). The top-ranking websites
include major China-based portals (e.g. baidu.com, sina.com.cn,
qq.com, sohu.com and 163.com), US-based websites (e.g.
youtube.com, facebook.com), mainland China-based news media
websites (e.g. people.com.cn, xinhuanet.com, ifeng.com) and the
aggregated category of mainland Chinese government websites
(i.e. gov.cn).
Table 2 orders the websites from the most visible one at the top row
to the least visible at the bottom row, while the order of search
engine variants is decided firstly by search engine providers (from
Baidu, Google to Yahoo) then secondly by region (from CN, HK,
SG to TW). It is relatively difficult, however, to see any pattern
right away from Table 2 as it is tabulated. In other words, although
each cell in the table shows the specific level of propensity that a
search engine variant prefers a certain website in their SERPs, the
table as a whole fails to show in a clear way the overall propensity
of which "group" of search engine variants favours which "set" of
websites.
To identify patterns of converging and diverging, I will use
blockmodeling analysis in the next subsection to study the visibility
scores in Table 2, each of which represents the strength of ties
between search engines and websites. To avoid arbitrary clustering
results produced by less-consequential websites collected in the
SERPs, only the top-100 most visible websites are considered for
analysis.
4.3 Clustering using blockmodeling analysis
Cluster analysis is commonly used for exploratory data mining to
find how different data points can be grouped based on some
statistical data analysis of similarities and differences. To find how
“birds of a feather flock together” for the websites and search
engine variants at hand, various clustering techniques can be
applied, including the agglomerative hierarchical clustering
analysis that produce a family tree that details how each data points
can be grouped.
Nonetheless, this study chooses blockmodeling analysis (Doreian,
Batagelj, & Ferligoj, 2004) for the following reasons. First, a
blockmodel analysis will produce simplified outcome that suits
better for the research question at hand: to identify the rough
patterns, without the need to see how specific details on which
website is closer to another. Second, as to be shown later, a
blockmodel analysis can greatly simplify a complex dataset to
provide succinct summarization of the overall structure. Third, as
researchers can and must design a blockmodel for data points to fit,
a blockmodel analysis is particularly useful to identify converging
and diverging patterns. It also provides a systematic way to see how
the data points fit the model or not. Fourth, a blockmodel can be
seen as a simplified network, and thus it can help to produce a
simplified visualization of network data. It should be noted that the
dataset can be seen as a two-mode network: Different “nodes” of
search engine variants giving different visibility scores to different
“nodes” of websites. It is thus equivalent to a network of visibility
scores. High visibility scores indicate strong “relationship”. It is an
example of two-mode network because there are two types of nodes
(i.e. search engine variants and websites) and the relationship
between the nodes is limited between the two types of nodes (i.e.
the visibility score contributed by one search engine variant to one
website).
4.3.1 A blockmodel design
Before detailing how the cluster outcome helps identify the
merging and diverging patterns systematically, it is necessary to
explain the basis on which I design the blockmodel in Table 3. To
build a blockmodel, researchers have to make design decisions on
g g g g
0%
10%
20%
30%
40%
50%
60%
70%
80%
0 20 40 60 80 100
Accumulative
Proportion
Rank-
ing
Websites
(Aggregated)
Baidu
_CN
Google
_CN
Google
_HK
Google
_SG
Google
_TW
Yahoo
_CN
Yahoo
_HK
Yahoo
_SG
Yahoo
_TW
1 zh.wikipedia.org 34.30 272.37 611.39 304.15 586.50 24.46 833.95 254.00 721.01
2 baike.baidu.com 661.93 410.28 174.04 433.81 125.52 72.44 39.10 508.05 4.88
3 hudong.com 5.30 107.93 71.29 107.92 57.31 267.17 2.54 168.23 0.35
4 baidu.com 385.80 51.36 13.29 53.21 9.93 20.52 7.17 102.80 1.65
5 sina.com.cn 59.18 76.85 21.69 69.33 16.63 41.70 2.04 35.29 0.68
6 knowledge.yahoo.com 0.10 0.03 0.29 0.36 93.46 20.33 140.07
7 edu.tw 0.46 5.14 21.14 7.21 64.29 0.06 30.61 21.07 102.98
8 qq.com 40.27 41.23 13.00 37.26 11.64 57.85 2.07 23.35 0.95
9 youtube.com 0.29 8.39 66.03 9.04 68.63 45.20 4.96 19.00
10 gov.cn 25.46 38.94 20.30 32.29 15.61 43.03 5.29 34.84 3.57
11 sohu.com 20.89 32.82 10.08 27.34 8.08 38.97 3.18 22.11 1.57
12 163.com 25.59 34.68 10.78 31.51 10.00 32.31 2.52 14.56 0.87
13 facebook.com 0.29 1.93 8.96 2.26 19.00 88.33 8.31 33.61
14 youku.com 42.04 29.12 10.32 19.34 8.41 36.38 1.03 15.31 0.64
15 people.com.cn 14.54 23.19 16.00 23.82 18.14 20.97 17.81 11.43 13.39
16 blog.sina.com.cn 21.73 28.47 15.41 26.79 13.95 9.75 4.27 33.78 2.53
17 xinhuanet.com 26.13 27.18 21.02 27.71 20.06 11.50 1.70 19.31 0.40
18 epochtimes.com 1.05 27.34 2.23 33.05 34.57 3.93 36.62
19 ifeng.com 25.67 25.13 11.86 24.39 9.67 16.70 4.20 10.12 2.56
20 baike.soso.com 11.08 7.60 1.31 5.93 1.05 29.16 0.29 63.30 0.04
6. 6
the “connection types” (e.g. “complete” versus “null”) and the
number of blocks. A block is said to be “complete” if all cells in
that block indicate strong relationship and a block is said to be “null”
if all cells in that block contain only weak or none relationship.
Thus the three by three blockmodel in Table 3 assumes the data
points will fit into nine blocks. For this study, nine search engines
will be divided into three groups, and the top-100 websites will be
categorized into three sets of websites.
Table 3 Expected outcome of blockmodeling
The rationale behind this model is to identify converging and
diverging patterns. The second part of the Table 3 shows how three
groups of search engine variants (Cluster A, B and C) may
converge or diverge on different sets of websites (Cluster X, Y and
Z). Thus, I assume a middle ground of websites exist: for all search
engine variants, there will be a set of websites that are all visible
(i.e. Cluster Y). That is, Cluster A, B and C converge on Cluster Y
with high visibility scores, indicated by the dark blocks containing
strong ties (i.e. high visibility scores). To account for any deviation
from the "converging" middle ground, I expect two blocks of low-
visibility cells (i.e. weak or none relationship), as represented by
two white cells in Table 3): one at the top-left and another at the
bottom-right. Both blocks thus indicate the patterns of divergence,
or lack of convergence. For this study, if all search engine variants
converge on the same top visible websites, then there should be no
patterns of divergence. Using this scenario of complete
convergence as the null hypothesis (no difference in visibility
patterns), I expect some evidence of diverging effects to reject the
null hypothesis. If there is a significant number of websites in the
low-visibility blocks (one at upper-left and another at lower-right
corner), then the diverging patterns are identified accordingly.
4.3.2 Patterns of merging and diverging
Using the blockmodeling function provided by a social network
analysis tool called Pajek, the 9 by 100 cells of strong versus weak
ties are simplified into the three-by-three blockmodel, as shown in
Table 4. For each cell, the color represents strong (dark) or weak
(white) ties, and these cells are roughly partitioned into three-by-
three blocks, thereby effectively clustering the nine search engine
variants into three groups and the 100 most visible websites into
three sets. It is not a perfect match, and there are 87 cells out of 900
(9.67%) that does not match the designed block model. Given the
space limitation, only the top-20 websites in full.
As shown in Table 4, for the top 100 websites, 39 of them are
categorized into the first cluster of websites (Cluster X), 13 to
Cluster Y and 49 to Cluster Z. If we look at the top-20 most visible
websites only, the converging set of websites (Cluster Y) is thin
(only one website). This website (people.com.cn) belongs to the
Chinese official party organ media People’s Daily.
Table 4 Blockmodeling outcome
weak strong strong
strong strong strong
strong strong weak
Rank-
ing
Websites
(Aggregated)
Baidu_
CN
Yahoo_
CN
Google
_CN
Yahoo_
SG
Google
_SG
Google
_TW
Google
_HK
Yahoo_
HK
Yahoo_
TW
1 zh.wikipedia.org 34.30 24.46 272.37 254.00 304.15 586.50 611.39 833.95 721.01
6 knowledge.yahoo.com 0.10 0.00 0.03 20.33 0.00 0.36 0.29 93.46 140.07
7 edu.tw 0.46 0.06 5.14 21.07 7.21 64.29 21.14 30.61 102.98
9 youtube.com 0.29 0.00 8.39 4.96 9.04 68.63 66.03 45.20 19.00
13 facebook.com 0.29 0.00 1.93 8.31 2.26 19.00 8.96 88.33 33.61
18 epochtimes.com 0.00 0.00 1.05 3.93 2.23 33.05 27.34 34.57 36.62
… and other 33 websites (The total number of websites is 39 for this block)
15 people.com.cn 14.54 20.97 23.19 11.43 23.82 18.14 16.00 17.81 13.39
… and other 12 websites (The total number of websites is 13 for this block)
2 baike.baidu.com 661.93 72.44 410.28 508.05 433.81 125.52 174.04 39.10 4.88
3 hudong.com 5.30 267.17 107.93 168.23 107.92 57.31 71.29 2.54 0.35
4 baidu.com 385.80 20.52 51.36 102.80 53.21 9.93 13.29 7.17 1.65
5 sina.com.cn 59.18 41.70 76.85 35.29 69.33 16.63 21.69 2.04 0.68
8 qq.com 40.27 57.85 41.23 23.35 37.26 11.64 13.00 2.07 0.95
10 gov.cn 25.46 43.03 38.94 34.84 32.29 15.61 20.30 5.29 3.57
11 sohu.com 20.89 38.97 32.82 22.11 27.34 8.08 10.08 3.18 1.57
12 163.com 25.59 32.31 34.68 14.56 31.51 10.00 10.78 2.52 0.87
14 youku.com 42.04 36.38 29.12 15.31 19.34 8.41 10.32 1.03 0.64
16 blog.sina.com.cn 21.73 9.75 28.47 33.78 26.79 13.95 15.41 4.27 2.53
17 xinhuanet.com 26.13 11.50 27.18 19.31 27.71 20.06 21.02 1.70 0.40
19 ifeng.com 25.67 16.70 25.13 10.12 24.39 9.67 11.86 4.20 2.56
20 baike.soso.com 11.08 29.16 7.60 63.30 5.93 1.05 1.31 0.29 0.04
… and other 35 websites (The total number of websites is 48 for this block)
relatively strong versus weak: vs blockmodel:
strong weak
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
“complete” connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
Table 5 Clusters identified by blockmodeling
4.4 Visualizing and unpacking findings
To show the results of visibility scores in a more intuitive manner,
a network visualization graph of the top-800 most visible websites
is shown in Figure 4. I visualize the nine search engine variants
(shown as the text boxes at the peripheral) and 800 most visible
websites (shown as nodes in the middle). Thus, the two-mode
network is presented in a way to indicate the overall likelihood for
a given search engine variant to recommend a website shown in the
middle. Pointing only from one node of search engine variant to
one node of website, each arrow represents a total visibility score
Cluster A Cluster B Cluster C
Cluster X complete complete
Cluster Y complete complete complete
Cluster Z complete complete
Cluster A Cluster B Cluster C
Cluster X
Cluster Y
Cluster Z
converging
converging
converging
Cluster A Cluster B Cluster C
Baidu_CN Google_HK Yahoo_HK
Google_CN Google_SG Yahoo_TW
Yahoo_CN Google_TW
Websites # Yahoo_SG
Cluster X 39 complete complete
Cluster Y 13 complete complete complete
Cluster Z 48 complete complete
7. 7
contributed by a search engine variant to a website, with its arrow
width proportional to the values of visibility scores: Wider arrows
indicate higher visibility scores . Similarly, the area size of a node
is proportional to the sum of visibility scores a website receive from
all search engine variants, allowing easy comparison on which
websites are more visible.
Note that the visibility scores are distributed quiet unevenly and
thus only the top 20 are marked with their respective ranking
numbers. User-generated encyclopedias are the most visible
websites (node 1: Chinese Wikipedia , node 2: Baidu Baike, node
3: Hudong). For another, Chinese Wikipedia(1) is highly visible to
almost all variations except Yahoo_CN and Baidu_CN, while
Baidu Baike(2) highly visible in Baidu_CN, Google_CN,
Google_SG, and moderately so in Google_HK.
Based on the previous clustering results, two red dash lines are also
drawn in Figure 4, roughly indicating three areas. Positioned in the
middle are the search engine variants in Cluster B, because of their
converging patterns on strong ties with most websites. The two red
dash lines also show the search engine variants in Cluster A to the
left and those in Cluster C to its right, indicating diverging effects
because of the presence of weak ties. This explains why Cluster A
and Cluster C is shown adjacent to Cluster B, but not adjacent to
each other. This visualization is thus consistent with the findings
shown in Table 5.
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
“complete” connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
The findings can also be unpacked depending the specific search
engine variant. Based on the same method, an additional 500
Chinese names of the Fortune 500 companies are added to the
selection of 2500 search queries, producing a second dataset in
2012 (Liao, 2013a). The following paragraphs unpack this second
dataset for two search engine variants in mainland China:
Google_CN (see Table 6) and Baidu_CN (see Table 7).
The results for the top-20 websites for each categories of search
queries of Google_CN, as shown in Table 6, show that Baidu.com
rank the top in almost all categories. Wikipedia.org is close second
here for Google_CN, suggesting a general observation that search
engines favour user-generated encyclopedias. The particular
findings also provide some counter evidence against the idea that
Google as a specific comapny favour Wikipedia as a website
because Google_CN actually favours Baidu Baike more than
Chinese Wikipedia, as clearly shown in Table 6.
The findings of Baidu_CN in Table 7 shows even more dominance
by Baidu Baike: It dominates all of seven categories with the
proportion of visibility scores is comparatively much concentrated
when compared to the results of Google_CN (see Table 6). In
addition, when considering the ranking position of hudong.com, the
findings seem to confirm the unfair competition accusation made
by Hudong’s CEO against Baidu (Yang, 2011). Depending on the
types of search quries, Hudong.com is ranked by Google_CN from
3rd to 9th (see Table 6). In contrast, Hudong Baike is not even
among the top-20 for many categories of the sampled queries for
Baidu Search. Indeed, if Google’s SERP can serve as an
independent third party for the competition between Baidu Baike
Figure 4. Delineating the boundaries of geo-linguistic settings based on SERPs.
Rank-
ing
Websites
(Aggregated)
1 zh.wikipedia.org
2 baike.baidu.com
3 hudong.com
4 baidu.com
5 sina.com.cn
6 knowledge.yahoo.com
7 edu.tw
8 qq.com
9 youtube.com
10 gov.cn
11 sohu.com
12 163.com
13 facebook.com
14 youku.com
15 people.com.cn
16 blog.sina.com.cn
17 xinhuanet.com
18 epochtimes.com
19 ifeng.com
20 baike.soso.com
8. 8
and Hudong, Google does not make Hudong almost invisible as
Baidu does.
Hence if users from mainland China use Google Search instead of
Baidu Search, then Chinese Wikipedia will become equally visible
as Baidu Baike for them.
5. DISCUSSION
By systematically analysing the SERPs collected across four major
Chinese-speaking regions, it is shown that the patterns of merging
and diverging do exist. It is achieved by calculating visibility scores
as the equivalent “social ties” between search engine variants on
one hand and top-ranking websites on the other. Both the network
visualization and the blockmodeling outcomes show that the geo-
linguistic factors do make Chinese-language SERPs diverge on
certain websites, while converging on another. In particular, of the
nine search engine variants, the first group that diverges from the
rest contains search engine variants designed for mainland China
(Baidu_CN, Yahoo_CN and Google_CN), The second group
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW).
The findings suggest that the major online boundary in Chinese
Internet is drawn first along the line of regional difference, with all
mainland Chinese search engine settings share similar SERPs
among themselves, but not with the others to the same degree, as
shown in Figure 4. Another boundary is drawn for Yahoo Taiwan
and Yahoo Hong Kong at the other end. It is relatively easy to
explain the latter results because Yahoo Search by default
prioritizes local content, with other geo-linguistic variant options
available for users listed in the web interface: e.g. “search the
traditional Chinese-character-written web pages” or “search the
global websites”.
In contrast, it is relatively difficult to provide just technical
explanations regarding the question why all three mainland Chinese
settings do not share that much with other settings in terms of the
corresponding SERPs. It is likely that many of the websites that
are absent from the SERPs in three mainland Chinese settings
include those are not politically welcome in mainland China. Note
that the first two columns in Table 5 represent Baidu_CN and
Yahoo_CN, both of which constantly have weak ties with several
of the top 100 websites. The two search engine variants also
represent the only two that filter SERPs for users in mainland China.
Note also that the third column in Table 5 represents Google_CN.
While it is clustered with Baidu_CN and Yahoo_CN, it has more
strong ties with the top 100 websites, suggesting it has less
divergent results.
The findings seems to suggest that users from mainland China, if
using only Baidu_CN and Yahoo_CN, will have a substantial
number of otherwise highly visible websites overlooked or even
missing from their daily search experiences. These include
websites such as YouTube and Facebook that have been reported
being blocked by mainland China. They also include the websites
of government and education institutions in Taiwan and Hong
Kong: gov.tw gov.hk, edu.tw and edu.hk. In other words, the
Table 6 Results for Google_CN
Ranking
1 baidu.com 47.65% baidu.com 25.08% baidu.com 36.44% baidu.com 37.28% wikipedia.org 28.98% baidu.com 27.89% mbalib.com 27.99%
2 wikipedia.org 25.36% wikipedia.org 12.94% wikipedia.org 15.33% wikipedia.org 24.13% baidu.com 26.82% wikipedia.org 25.14% baidu.com 16.67%
3 hudong.com 8.74% sina.com.cn 12.06% sina.com.cn 9.46% hudong.com 11.00% hudong.com 7.66% hudong.com 9.63% fortunechina.com 13.65%
4 sina.com.cn 2.58% qq.com 6.67% douban.com 5.00% mbalib.com 3.55% sina.com.cn 7.10% sina.com.cn 7.17% wikipedia.org 8.74%
5 ifeng.com 2.03% 163.com 6.01% qq.com 4.45% sina.com.cn 3.18% xinhuanet.com 4.81% sohu.com 3.59% qq.com 4.09%
6 artxun.com 1.33% sohu.com 5.86% hudong.com 3.60% people.com.cn 2.66% people.com.cn 4.03% people.com.cn 3.34% qkankan.com 3.79%
7 soso.com 1.30% hudong.com 4.27% sohu.com 3.33% qq.com 2.64% qq.com 3.39% xinhuanet.com 2.95% sina.com.cn 3.62%
8 zdic.net 1.13% youku.com 4.26% youku.com 3.14% hc360.com 1.61% ifeng.com 3.30% youku.com 2.74% ifeng.com 3.59%
9 tiexue.net 1.07% xinhuanet.com 3.29% 163.com 3.09% sohu.com 1.50% 163.com 2.93% qq.com 2.45% hudong.com 3.41%
10 cncn.com 1.06% ifeng.com 2.78% mtime.com 2.14% 163.com 1.46% sohu.com 2.30% iciba.com 1.94% gold678.com 3.20%
11 xinhuanet.com 1.04% douban.com 2.47% youtube.com 1.77% hexun.com 1.44% weibo.com 1.71% 163.com 1.90% 163.com 2.43%
12 artx.cn 1.03% people.com.cn 2.31% 1ting.com 1.63% ifeng.com 1.43% youtube.com 1.55% ifeng.com 1.72% ciipp.com 1.29%
13 people.com.cn 0.96% hexun.com 1.85% weibo.com 1.58% studa.net 1.26% boxun.com 1.25% 360doc.com 1.50% sohu.com 1.15%
14 youku.com 0.84% huanqiu.com 1.59% m1905.com 1.56% 3edu.net 1.05% hexun.com 0.78% youtube.com 1.45% egouz.com 1.12%
15 163.com 0.83% youtube.com 1.57% iqiyi.com 1.50% 39.net 1.04% renren.com 0.62% sogou.com 1.25% bitauto.com 1.11%
16 sohu.com 0.73% yahoo.com 1.51% sogou.com 1.39% edu.cn 1.02% edu.tw 0.60% tianya.cn 1.12% people.com.cn 0.96%
17 qq.com 0.63% gov.tw 1.45% tudou.com 1.35% jrj.com.cn 1.00% china.com.cn 0.58% laonanren.com 1.03% zol.com.cn 0.93%
18 edu.tw 0.60% iqiyi.com 1.43% ifeng.com 1.27% chinaacc.com 0.97% libertytimes.com.tw 0.55% hexun.com 0.89% hexun.com 0.88%
19 edu.cn 0.54% weibo.com 1.32% xiami.com 1.07% xinhuanet.com 0.95% twitter.com 0.53% soso.com 0.81% yup.cn 0.72%
20 5156edu.com 0.54% tudou.com 1.27% pptv.com 0.91% youku.com 0.83% yahoo.com 0.52% cfdd.org.cn 0.76% google.cn 0.66%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
Table 7 Results for Baidu_CN
Ranking
1 baidu.com 75.74% baidu.com 64.17% baidu.com 73.28% baidu.com 81.56% baidu.com 57.53% baidu.com 69.54% baidu.com 61.90%
2 wikipedia.org 6.20% youku.com 4.79% youku.com 6.66% wikipedia.org 2.41% wikipedia.org 7.48% wikipedia.org 5.30% mbalib.com 7.62%
3 hudong.com 1.98% sina.com.cn 4.59% iqiyi.com 2.57% sina.com.cn 2.16% qq.com 6.12% sina.com.cn 3.38% fortunechina.com 7.13%
4 sina.com.cn 1.94% qq.com 4.13% douban.com 2.30% qq.com 2.05% sina.com.cn 5.00% qq.com 3.23% sina.com.cn 3.20%
5 youku.com 1.86% sohu.com 3.05% tudou.com 1.91% youku.com 1.59% ifeng.com 2.82% youku.com 2.17% ifeng.com 2.27%
6 soso.com 1.64% iqiyi.com 2.73% sina.com.cn 1.65% xinhuanet.com 1.14% people.com.cn 2.52% sohu.com 1.73% fx678.com 1.91%
7 qq.com 1.61% 163.com 2.32% weibo.com 1.61% www.gov.cn 1.10% sohu.com 2.46% xinhuanet.com 1.68% zol.com.cn 1.73%
8 ifeng.com 1.18% tudou.com 1.91% qq.com 1.55% edu.cn 0.89% xinhuanet.com 2.31% 163.com 1.50% wikipedia.org 1.73%
9 douban.com 1.13% xinhuanet.com 1.53% xunlei.com 1.48% ifeng.com 0.80% 163.com 1.84% tianya.cn 1.47% qq.com 1.60%
10 tiexue.net 0.89% douban.com 1.28% mtime.com 1.07% sohu.com 0.78% soso.com 1.68% people.com.cn 1.40% bitauto.com 1.53%
11 weather.com.cn 0.88% ifeng.com 1.24% letv.com 0.78% people.com.cn 0.74% weibo.com 1.52% hexun.com 1.27% 163.com 1.38%
12 edu.cn 0.61% renren.com 1.19% m1905.com 0.78% douban.com 0.60% uname.cn 1.40% soso.com 1.17% qkankan.com 1.37%
13 xilu.com 0.59% letv.com 1.14% 163.com 0.73% 163.com 0.59% renren.com 1.36% douban.com 1.04% gongchang.com 1.05%
14 xinhuanet.com 0.58% weibo.com 0.97% verycd.com 0.68% rayli.com.cn 0.59% kaixin001.com 1.32% tudou.com 0.89% ticarefree.cn 1.04%
15 163.com 0.58% wikipedia.org 0.97% sohu.com 0.55% hao123.com 0.57% douban.com 0.97% bitauto.com 0.88% soso.com 0.86%
16 guoxue.com 0.57% zol.com.cn 0.93% 1ting.com 0.53% jrj.com.cn 0.50% youku.com 0.85% ifeng.com 0.73% yingjiesheng.com 0.83%
17 360buy.com 0.52% xunlei.com 0.80% pptv.com 0.50% huanqiu.com 0.49% 360buy.com 0.78% sensagent.com 0.70% autohome.com.cn 0.74%
18 qidian.com 0.51% taobao.com 0.80% ku6.com 0.48% iqiyi.com 0.48% www.gov.cn 0.73% hudong.com 0.66% xgo.com.cn 0.73%
19 tudou.com 0.51% huanqiu.com 0.74% yinyuetai.com 0.48% bankcomm.com 0.47% edu.cn 0.73% yangbihu.com 0.65% eastmoney.com 0.70%
20 sohu.com 0.50% 4399.com 0.71% wikipedia.org 0.42% chinaacc.com 0.46% hudong.com 0.58% tiexue.net 0.61% people.com.cn 0.68%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
9. 9
SERPs of the three mainland Chinese variants seem to diverge from
these websites. In contrast, the websites of government and
education institutions in mainland China, gov.cn and edu.cn, are
still relatively visible for almost all other search engine variants
except for the by-default-local Yahoo_TW and Yahoo_HK. Thus,
the patterns of merging and diverging seem to reflect the cultural
political complications of Chinese-language internet. While the
offline boundary between Hong Kong and Taiwan seems to be
overcome, that between mainland China and Hong Kong seems to
be reinforced. Although the SERP data may not reflect perfectly
what users actually read and click, it nonetheless indicates a general
probabilistic tendency substantiated by industry data.
6. CONCLUSION
The findings, visualized and analysed using network analysis
techniques, clearly indicate a strong localization effects on the
gatekeeping function of search engines, based on data covering
over 97% of the search engine market for four Chinese-speaking
regions. The findings also show major user-generated
encyclopedias such as Baidu Baike and Chinese Wikipedia do
dominate the SERPs with high rankings and visibility scores.
Because of the geo-linguistic factors coincide with different
cultural political situations of these Chinese-speaking regions,
different localization variants produce divergent outcomes of high-
ranking encyclopedia and other websites, thereby indicating strong
effects of “network gatekeeping” by search engines in exercising
gatekeeping bases of “display” and “localization”(Barzilai-Nahon,
2008).
In addition, by examining the overall patterns of SERPs, I have
demonstrated the merging and diverging effects contributed by the
factors of search engine providers and regional and language
settings. Different combinations of such provider and geo-linguistic
information lead to different “search engine variants”. Nine major
search engine variants, covering four regions with Chinese-
speaking majority population, are identified for the Chinese-
language internet. For a selected set of search queries covering
major Chinese cultural and political topics, I have found that the
SERPs converge on a specific type of websites (i.e. user-generated
encyclopedias) and that some search engine variants converge more
on Baidu Baike while other on Chinese Wikipedia. The merging
and diverging patterns are further analysed by both network
visualization and network analysis (blockmodeling analysis of two-
mode networks). Different patterns indicate that both
“nationalization” of a specific kind (i.e. mainland China) and
“trans-nationalization” (i.e. Hong Kong and Taiwan) can be
achieved by different gatekeeping options offered by various search
engine variants.
The results show that the SERPs are more likely to converge based
on similar geo-linguistic preferences. For example, the SERPs
diverge the most when users choose different Chinese characters
(i.e. simplified Chinese versus traditional Chinese). It is then
particularly intriguing that all Hong Kong variant results converge
more with Taiwanese variant ones and much less so with mainland
Chinese variants, while Hong Kong is much closer to mainland
China geographically, politically and administratively. In addition,
Chinese Wikipedia is much more visible in these regions than in
mainland China. Though the findings here cannot further
breakdown the geo-linguistic factors from cultural political ones,
the converging and diverging patterns alone are important findings
for Chinese-internet research and Wikipedia research.
There are of course obvious limitations for the findings presented
above. First, the selection of search query, while significant larger
than previous social scientific research on Chinese-language search
engines(Jiang & Akhtar, 2011), is still limited. Second, due to
limitation of space, this paper has not yet fully unpacked the
different findings for different categories of search queries. Third,
only standard Mandarin Chinese terms are used for this research,
overlooking other possibilities of written Cantonese queries (Chau,
Fang, & Yang, 2007). Forth but not last, only the default setting for
each localized search engine is analysed.
While the dataset presented may be limited in the scope of selected
search queries, time and search engine variants, I have
demonstrated the usefulness and viability of examining the merging
and diverging patterns because of the search engine variants, each
of which correspond to a segment of search engine market. For
instance, it can help online linguistics research by analysing
different SERP outcome for regions that use a shared writing
system but with regional variants, such as the difference between
Egyptian Arabic and Maghrebi Arabic. For another example, these
geo-linguistic factors can be said to constitute one of the most
important online “situations” for online media, as defined by
medium theorists in the tradition of media ecology (Meyrowitz,
1986, 1994), because these factors set the patterns of access.
According to a statistical report by the Data Center of China
Internet, During the first half year of 2010, the content produced by
amateur Chinese Internet users have surpassed that produced by
professional websites (Liao, 2013b; Qiang, 2010). Thus user-
generated content by Chinese Internet users are expected to have
influenced user-generated encyclopedias directly and SERP
indirectly. While this study has not yet addressed the relationship
among search engines, user-generated content and user-generated
encyclopedias, the findings here seems to suggest similar
geographic and linguistic dynamics. The clear outcome of
“network gatekeeping”, identified by Chinese search engine
variants and their respective preferred encyclopedias, may point to
a larger online context for Chinese Internet users across regions.
For future research, it will be useful to examine how geographic
and linguistic factors may influence the network gatekeeping
processes inside user-generated encyclopedias (Liao, 2009). It is
likely that they also exercise the gatekeeping bases of “display” and
“localization” as search engines do.
The overall method can be systematically extended for other
contexts. Various search engine variants can be chosen for research
for almost all the other language in the world, including languages
with transnational adoption such as Arabic, Hindu, Tamil, English,
Spanish, Portuguese, etc. Researchers can thus further interpret the
merging and diverging SERP outcome for research questions that
are relevant for global, transnational or inter-cultural
communications on one hand, and another set of questions for
human-computer interaction and information system on the other.
Also, the focus on examining geo-linguistic factors as important
variables for understanding search engines can contribute to the
development of geo-linguistic analysis of the Web (Liao & Petzold,
2011; Petzold & Liao, 2011). It can also be adopted for market and
industry applications when geo-linguistic identifiers are central
(DePalma, 2002; Dunne, 2006) .
In conclusion, the proposed method has the potentials for a wider
range of market and academic applications. The theoretical
implication may be extended to other websites or information
systems that produce or curate different outcome based on
geographic and linguistic preferences (or configurations) of users.
It highlights the role of geo-linguistic parameters as media “access
codes”, or set patterns of access to information as articulated by
medium theorists for TV research (Meyrowitz, 1986, 1994), or the
10. 10
“network gatekeeping process” theorized by new information
science theory (Barzilai-Nahon, 2008). Localization has become
the new medium that has the (higher-level) messages of cultural
political integration, reintegration or fragmentation of users.
7. ACKNOWLEDGMENTS
This work was supported by the Taiwan National Science
Council’s Taiwan Merit Scholarships Program (NSC-095-SAF-I-
564-028-TMS) and supported in part by the Oxford Internet
Institute Scholarship. Special thanks to Ralph Schroeder, Bernie
Hogan, Scott Hales and Min Jiang for their advice and support.
8. REFERENCES
Aragón, P., Kaltenbrunner, A., Laniado, D., & Volkovich, Y.
(2012). Biographical Social Networks on Wikipedia - A cross-
cultural study of links that made history. In Proceedings of
WikiSym 2012. Retrieved from http://arxiv.org/abs/1204.3799
Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D.
(2012). Omnipedia: bridging the Wikipedia language gap. In
Proceedings of the 2012 ACM annual conference on Human
Factors in Computing Systems (pp. 1075–1084). Retrieved from
http://dl.acm.org/citation.cfm?id=2208553
Bar‐Ilan, J. (2006). Web links and search engine ranking: The case
of Google and the query “jew.” Journal of the American Society
for Information Science and Technology, 57(12), 1581–1589.
doi:10.1002/asi.20404
Barzilai-Nahon, K. (2008). Toward a theory of network
gatekeeping: A framework for exploring information control.
Journal of the American Society for Information Science and
Technology, 59(9), 1493–1512. doi:10.1002/asi.20857
Battelle, J. (2005). The Search: How Google and Its Rivals Rewrote
the Rules of Business and Transformed Our Culture (First
Edition.). Portfolio Hardcover.
BBC. (2011, March 31). Google’s China exit “exaggerated.” BBC.
Retrieved from http://www.bbc.co.uk/news/business-12917322
Benkler, Y. (2006). The Wealth of Networks: How Social
Production Transforms Markets and Freedom. New Haven and
London: Yale University Press. Retrieved from
http://www.congo-education.net/wealth-of-networks/
Bermejo, F. (2009). Audience manufacture in historical
perspective: from broadcasting to Google. New Media &
Society, 11(1-2), 133 –154. doi:10.1177/1461444808099579
Brettel, M., & Spilker-Attig, A. (2010). Online advertising
effectiveness: a cross-cultural comparison. Journal of Research
in Interactive Marketing, 4(3), 176–196.
doi:10.1108/17505931011070569
Charlton, G. (2012, February 13). Why Wikipedia is top on Google:
the SEO truth no-one wants to hear. Econsultancy: Digital
Marketers United. Retrieved from
http://econsultancy.com/blog/9009-why-wikipedia-is-top-on-
google-the-seo-truth-no-one-wants-to-
hear?utm_campaign=bloglikes&utm_medium=socialnetwork&
utm_source=facebook
Chau, M., Fang, X., & Yang, C. C. (2007). Web searching in
Chinese: A study of a search engine in Hong Kong. Journal of
the American Society for Information Science and Technology,
58(7), 1044–1054. doi:10.1002/asi.20592
Chen, J. (2008). Essays on auction mechanisms and resource
allocation in keyword advertising (The University of Texas at
Austin). ProQuest.
CIC. (2009). China Search Engine Market Report 2009. Beijing,
China: China IntelliConsulting Corporation. Retrieved from
http://tech.sina.com.cn/z/2009ssdc/index.shtml
CNNIC. (2006, September 16). Chinese Search Engine Market
Survey Report 2006. China Internet Network Information
Center. Retrieved November 19, 2011, from
http://xtlv.cn/html/Dir/2006/11/06/4216.htm
CNNIC. (2007, September 26). 2007 Survey Report on Search
Engine Market in China. China Internet Network Information
Center. Retrieved November 19, 2011, from
http://www.cnnic.cn/html/Dir/2007/10/10/4838.htm
CNNIC. (2009, March 5). China Search Engine Report 2008
Advertisers and Users Behavior Study. (中国搜索引擎市场广
告主与用户行为研究报告). Retrieved November 19, 2011,
from http://www.cnnic.cn/html/Dir/2009/03/05/5483.htm
Couvering, E. V. (2004). New Media? The Political Economy of
Internet Search Engines. Presented at the International
Association of Media & Communications Researchers, Porto
Alegre, Brazil. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.
1900
Couvering, E. V. (2008). The History of the Internet Search Engine:
Navigational Media and the Traffic Commodity. In A. Spink &
M. Zimmer (Eds.), Web Search (Vol. 14, pp. 177–206). Berlin,
Heidelberg: Springer Berlin Heidelberg. Retrieved from
http://www.springerlink.com/content/xn75781g305j756h/
Čuhalev, J. (2006). Ranking of Wikipedia articles on search
engines for searches about its own articles (Seminar Task for
Internet Search Techniques and Business Intelligence class) (p.
7). Retrieved from
http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-
wikipedia-in-your-google-searches/
Dahlberg, L. (2005). The Corporate Colonization of Online
Attention and the Marginalization of Critical Communication?
Journal of Communication Inquiry, 29(2), 160 –180.
doi:10.1177/0196859904272745
Damm, J. (2007). The Internet and the fragmentation of Chinese
society. Critical Asian Studies, 39, 273–294.
doi:doi:10.1080/14672710701339485
DePalma, D. A. (2002). Internationalization and Localization. In
Business without borders: a strategic guide to global marketing.
New York: John Wiley and Sons.
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized
blockmodeling of two-mode network data. Social Networks,
26(1), 29–53. doi:10.1016/j.socnet.2004.01.002
Dunleavy, P., Margetts, H., Bastow, S., Pearce, O., & Tinkler, J.
(2007). Government on the internet: progress in delivering
information and services online. UK: National Audit Office.
Retrieved from
http://www.nao.org.uk/publications/nao_reports/06-
07/0607529.pdf
Dunne, K. J. (2006). Perspectives on Localization. John Benjamins
Publishing Company.
Dutta, S., Dutton, W. H., & Law, G. (2011). The New Internet
World: A Global Perspective on Freedom of Expression,
Privacy, Trust and Security Online. SSRN eLibrary. Retrieved
from
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1810005
Einhorn, B. S., Bruce. (2010, November 11). How Baidu Won
China. BusinessWeek: Online Magazine. Retrieved from
http://www.businessweek.com/magazine/content/10_47/b4204
060242597_page_6.htm
Enquiro. (2007, June 15). Chinese Eye Tracking Study: Baidu Vs
Google. Retrieved July 9, 2009, from
http://searchengineland.com/chinese-eye-tracking-study-baidu-
vs-google-11477
11. 11
Fallows, D. (2008). Search Engine Use. Pew Research Center’s
Internet & American Life Project. Retrieved November 19,
2011, from http://www.pewinternet.org/Reports/2008/Search-
Engine-Use.aspx
Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using
collaborative filtering to weave an information tapestry.
Commun. ACM, 35(12), 61–70. doi:10.1145/138859.138867
Gray, M. (2007, May). Google Love Affair with Wikipedia -
Graywolf’s SEO Blog. Graywolf’s SEO Blog. Retrieved
December 2, 2011, from http://www.wolf-
howl.com/google/google-love-affair-with-wikipedia/
Hargittai, E. (2007). The Social, Political, Economic, and Cultural
Dimensions of Search Engines: An Introduction. Journal of
Computer‐Mediated Communication, 12(3), 769–777.
Hearne, R. (2006, August 12). SERP Click Through Rate of Google
Search Results – AOL-data.tgz – Want to Know How Many
Clicks The #1 Google Position Gets? Red Cardinal. Retrieved
December 2, 2011, from http://www.redcardinal.ie/search-
engine-optimisation/12-08-2006/clickthrough-analysis-of-aol-
datatgz/
Hecht, B., & Gergle, D. (2010). The tower of Babel meets web 2.0:
user-generated content and its applications in a multilingual
context. In Proceedings of the 28th international conference on
Human factors in computing systems (pp. 291–300). Retrieved
from http://dl.acm.org/citation.cfm?id=1753370
Hopkins, H. (2009, January 23). Britannica 2.0: Wikipedia Gets
97% of Encyclopedia Visits. Hitwise Intelligence: Analyst
Weblog. Retrieved from http://weblogs.hitwise.com/us-heather-
hopkins/2009/01/britannica_20_wikipedia_gets_9.html
Hussain, S., & Mohan, R. (2008). Localization in Asia Pacific. In
F. Librero & P. B. Arinto (Eds.), Digital Review of Asia Pacific
2007/2008. Orbicom and the International Development
Research Centre (IDRC). Retrieved from
http://www.idrc.ca/openebooks/377-5/
IDATE. (2011). World Internet Usage & Markets. IDATE
Consulting and Research. Retrieved from
http://www.idate.org/en/Research-store/Collection/Market-
Data-Reports_23/World-Internet-Usage-Markets_584.html
Jansen, B. J., Brown, A., & Resnick, M. (2007). Factors relating to
the decision to click on a sponsored link. Decision Support
Systems, 44(1), 46–59. doi:10.1016/j.dss.2007.02.009
Jansen, B. J., & Mullen, T. (2008). Sponsored search: an overview
of the concept, history, and technology. Int. J. Electronic
Business, 6(2), 114–131.
Jansen, J. (2011). Understanding Sponsored Search: Core
Elements of Keyword Advertising. Cambridge University Press.
Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese
Search Engines: A Comparative Study of Baidu, Google, and
Goso. Presented at the The 9th Chinese Internet Research
Conference (CIRC 2011), Washington, D.C.: Institute for the
Study of Diplomacy. Georgetown University.
Jones, R. (2007, June 26). 96.6% of Wikipedia Pages Rank in
Google’s Top 10. The Google Cache: Search Engine Marketing,
SEO & PPC. Retrieved December 2, 2011, from
http://www.thegooglecache.com/white-hat-seo/966-of-
wikipedia-pages-rank-in-googles-top-10/
Jucquois-Delpierre, M. (2007). Fictional reality or real fiction: how
can one decide?: The strengths and weaknesses of information
science concepts and methods in the media world. Journal of
Information, Communication & Ethics in Society, 5(2/3), 235–
252. doi:10.1080/14616700306488
Jung, G. (2008). The Increasing Relevance of Online Marketing.
GRIN Verlag.
Khanna, A. (2011, October 26). Google drives traffic to Wikipedia,
but half of readers look for Wikipedia content — Wikimedia
blog. Wikimedia Foundation: Global blog. Official blog.
Retrieved from http://blog.wikimedia.org/2011/10/26/search-
and-wikipedia/
Liao, H.-T. (2008). A webometric comparison of Chinese
Wikipedia and Baidu Baike and its implications for
understanding the Chinese-speaking Internet. In 9th annual
Internet Research Conference: Rethinking Community,
Rethinking Place. Copenhagen.
Liao, H.-T. (2009). Conflict and Consensus in the Chinese version
of Wikipedia. IEEE Technology and Society Magazine, 28(2),
49–56. doi:10.1109/MTS.2009.932799
Liao, H.-T. (2011). Needing to Have a Voice: Linguisitc Grouping
in the Digital Networked Environment (ISD Working Papers in
New Diplomacy). Washington, D.C.: Institute for the Study of
Diplomacy. Georgetown University. Retrieved from
http://isd.georgetown.edu/files/Needing%20to%20Have%20a
%20Voice.pdf
Liao, H.-T. (2013a). How does Chinese localization influence
online visibility? A study on Chinese-language Search Engine
Result Pages (SERPs). (Accepted). To be presented at the 11th
Annual Chinese Internet Research Conference (CIRC 2013),
Oxford, UK.
Liao, H.-T. (2013b). “Online Encyclopedia” (网上/网络百科全书
), “User Generated Content” (用户生成内容). In (L. Cheng,
Ed.)The Internet in China: An Encyclopedic Handbook of
Online Business, Information Distribution, and Social
Connectivity. Berkshire Publishing.
Liao, H.-T., & Petzold, T. (2011). Analysing geo-linguistic
dynamics of the World Wide Web: The use of cartograms and
network analysis to understand linguistic development in
Wikipedia. Cultural Science, 3(2).
Luyt, B., Goh, D., & Lee, C. S. (2009). Searching locally: a
comparison of Yehey! and Google. Online Information Review,
33(3), 499–510.
Malaga, R. A. (2008). Worst practices in search engine
optimization. Commun. ACM, 51(12), 147–150.
doi:10.1145/1409360.1409388
Margetts, H. Z., & Escher, T. (2006). Governing from the Centre?
Comparing the Nodality of Digital Governments. SSRN
eLibrary. Retrieved from
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1755762
Massa, P., & Scrinzi, F. (2012). Manypedia: Comparing Language
Points of View of Wikipedia Communities. In Proceedings of
WikiSym 2012. Retrieved from
http://orga.wikisym.org/ws2012/bin/download/Main/Program/
p13wikisym2012.pdf
Mazieres, A., & Huron, S. (2013). Toward Google Borders.
Presented at the Web Science. Retrieved from
http://hal.inria.fr/hal-00805048
McKenna, M. G., & Naftulin, H. (2000). Challenges in the
multicultural HCI development environment. In CHI ’00
extended abstracts on Human factors in computing systems (pp.
362–362). New York, NY, USA: ACM.
doi:10.1145/633292.633509
Meyrowitz, J. (1986). No sense of place : the impact of electronic
media on social behavior. New York ; Oxford: Oxford
University Press.
Meyrowitz, J. (1994). Medium theory. In D. Crowley & D. Mitchell
(Eds.), Communication Theory Today. Stanford University
Press.
12. 12
Morris, M., & Ogan, C. (2002). The Internet as Mass Medium. In
D. McQuail (Ed.), McQuail’s reader in mass communication
theory (pp. 134–145). London: SAGE.
Nguyen, C. (2011, March). Search Engine Market share by country.
Chandler Nguyen Digital Marketing Blog. Retrieved December
1, 2011, from http://www.chandlernguyen.com/2011/03/search-
engine-market-share-by-country-mar-2011.html
Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000
Percent In Five Years, Driven By Search. New York: Nielsen
Online. Retrieved from
http://news.softpedia.com/news/Wikipedia-Traffic-Mostly-
from-Google-85703.shtml
Petzold, T., & Liao, H.-T. (2011). Geo-linguistic analysis of the
World Wide Web: The use of cartograms and network analysis
to understand linguistic development in Wikipedia. In D. Araya,
Y. Breindl, & T. J. Houghton (Eds.), Nexus: New Intersections
in Internet Research (pp. 55–75). New York: Peter Lang.
Petzold, T., Liao, H.-T., Hartley, J., & Potts, J. (2012). A world map
of knowledge in the making: Wikipedia’s inter-language linkage
as a dependency explorer of global knowledge accumulation.
Leonardo: Art, Science and Technology, 45(3), 284–284.
doi:10.1162/LEON_a_00376
PricewaterhouseCoopers. (2011). IAB Internet Advertising
Revenue Report. New York; DC: The Interactive Advertising
Bureau. Retrieved from http://www.iab.net/AdRevenueReport
Qiang, X. (2010, July 23). User-generated content online now
50.7% of total. China Daily. Beijing. Retrieved from
http://www.chinadaily.com.cn/business/2010-
07/23/content_11042851.htm
Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of
View? A Comparison of Srebrenica articles across Wikipedia’s
language versions. In Wikipedia Academy: Research and Free
Knowledge (#wpac2012). Berlin. Retrieved from
http://wikipedia-
academy.de/2012/w/images/8/89/3_Paper_Richard_Rogers_E
mina_Sendijarevic.pdf
Russell, J. (2011). Why Yahoo! –not Google– rules Taiwan’s
webspace. Asian Correspondent. Retrieved December 1, 2011,
from http://asiancorrespondent.com/55695/focus-on-taiwan-
where-yahoo-not-google-rules-the-countrys-webspace/
Segev, E. (2008). Search Engines and Power: A Politics of Online
(Mis-) Information. text. Retrieved November 19, 2011, from
http://www.webology.org/2008/v5n2/a54.html
SEMPO. (2011). SEMPO State of Search Marketing Report 2011.
SEMPO Institute. Retrieved from
http://econsultancy.com/uk/reports/sempo-state-of-search
Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of
Google UK for 99% of searches. Intelligent Positioning Blog.
Retrieved from
http://www.intelligentpositioning.com/blog/2012/02/wikipedia
-page-one-of-google-uk-for-99-of-searches/
Slingshot SEO. (2011). Google & Bing Click-Through Rates
(White paper). Retrieved from
http://www.slingshotseo.com/resources/white-papers/google-
ctr-study/
Spindler, S. (2010). Online Marketing: How to Increase
International Sales with Search Engine Optimisation. GRIN
Verlag.
StatCounter. (2011). Top 5 Search Engines in China/Hong
Kong/Singapore/Taiwan from Nov 2010 to Nov 2011.
StatCounter Global Stats. Retrieved December 1, 2011, from
http://gs.statcounter.com/#search_engine-CN-monthly-
201011-201111
Sunstein, C. R. (2002). Fragmentation and Cybercascades. In
Republic.Com. Princeton University Press.
The Cambridge encyclopedia of China. (1991) (2nd ed.).
Cambridge [England] ; New York: Cambridge University Press.
University, J. G. H. L. S. P. of L. H., & School, T. W. P. of L. C.
L. (2006). Who Controls the Internet? : Illusions of a Borderless
World: Illusions of a Borderless World. Oxford University
Press.
Varian, H. R. (2007). The Economics of Internet Search. Presented
at the Angelo Costa lecture, Rome. Retrieved from
http://people.ischool.berkeley.edu/~hal/Papers/2007/costa-
lecture.pdf
Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias:
evidence and possible causes. Information Processing &
Management, 40(4), 693–707.
Vaughan, L., & Zhang, Y. (2007). Equal Representation by Search
Engines? A Comparison of Websites across Countries and
Domains. Journal of Computer-Mediated Communication,
12(3). Retrieved from
http://jcmc.indiana.edu/vol12/issue3/vaughan.html
Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In
Search of the Ur-Wikipedia: Universality, Similarity, and
Translation in the Wikipedia Inter-language Link Network.
Retrieved from
http://www.grouplens.org/system/files/p3wikisym2012.pdf
Yang, Y. (2011, February 25). China’s “Wikipedia” Submits
Complaint about Baidu. Economic Observer News, 508, 28.
Young, R. D. (2011, August 10). Top Google Ranking Captures
18.2% of Clicks. Search Engine Watch (#SEW). Retrieved
December 2, 2011, from
http://searchenginewatch.com/article/2100616/Top-Google-
Ranking-Captures-18.2-of-Clicks-Study
Zhao, S., & Baldauf, R. B. J. (2007). Planning Chinese Characters:
Reaction, Evolution or Revolution? Springer.