Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Wikisym2013] serp revised_apa_notice


Published on

Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP). However, little research has been conducted on non-Google search engines and non-English versions of user-generated encyclopedias. This paper proposes a method to quantify the “display” gatekeeping differences of the SERP ranking and presents findings based on the Chinese SERP data. Based on 2,500 mainly-Chinese-language search queries, the data set includes the SERP outcome of four Chinese-speaking regions (mainland China, Singapore, Hong Kong and Taiwan) provided by three major search engines (Baidu, and Google and Yahoo), covering over 97% of the search engine market in each region. The findings, analysed and visualized using network analysis techniques, demonstrate the followings: major user-generated encyclopedias are among the most visible; localization factors matter (certain search engine variants produce the most divergent outcomes, especially mainland Chinese ones). The indicated strong effects of “network gatekeeping” by search engines also suggest similar dynamics inside user-generated encyclopedias.

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

[Wikisym2013] serp revised_apa_notice

  1. 1. Only the abstract here is included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The full text is a work-in- progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research. How does localization influence online visibility of user- generated encyclopedias? A study on Chinese-language Search Engine Result Pages (SERPs) Han-Teng Liao Oxford Internet Institute University of Oxford Oxford, United Kingdom ABSTRACT Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP). However, little research has been conducted on non-Google search engines and non-English versions of user-generated encyclopedias. This paper proposes a method to quantify the “display” gatekeeping differences of the SERP ranking and presents findings based on the Chinese SERP data. Based on 2,500 mainly-Chinese-language search queries, the data set includes the SERP outcome of four Chinese-speaking regions (mainland China, Singapore, Hong Kong and Taiwan) provided by three major search engines (Baidu, and Google and Yahoo), covering over 97% of the search engine market in each region. The findings, analysed and visualized using network analysis techniques, demonstrate the followings: major user-generated encyclopedias are among the most visible; localization factors matter (certain search engine variants produce the most divergent outcomes, especially mainland Chinese ones). The indicated strong effects of “network gatekeeping” by search engines also suggest similar dynamics inside user-generated encyclopedias. Categories and Subject Descriptors [Human-centered computing]: Collaborative and social computing – Collaborative filtering, Wikis, Empirical studies in collaborative and social computing [Information Systems]: Web search engines – Collaborative filtering, Page and site ranking General Terms Management, Performance, Design, Human Factors, Theory Keywords Geo-linguistic analysis, network analysis, Network gatekeeping, Chinese Internet, Chinese characters, Localization, censorship. 1. INTRODUCTION Using search engine is among the most popular online activity for users in the US (Fallows, 2008) and mainland China (CIC, 2009; CNNIC, 2009), and has been among the driving forces of the fast- growing online advertising platform (Varian, 2007; SEMPO, 2011; IDATE, 2011; PricewaterhouseCoopers, 2011). It has been reported that (and speculated why) the global leader of search engines Google has consistently favoured the global leader of user- generated encyclopedias Wikipedia by showing relevant pages frequently and prominently in the search engine result pages (thereafter SERP) (Charlton, 2012; Čuhalev, 2006; Gray, 2007; Jones, 2007; Silverwood-Cope, 2012). Independent market research by Nielsen Online and Hitwise Intelligence has demonstrated that Wikipedia not only dominates the online visits for encyclopedia content, but also does so mainly because of the traffic directed by major Web search engines (Hopkins, 2009; Nielsen Online, 2008). Even the Wikimedia Foundation acknowledged this (Google drives traffic to Wikipedia), but nonetheless argued that half of its readers did want to look for Wikipedia content (Khanna, 2011). Thus, as major websites that dominate traffic and user attention, Google and Wikipedia seem to be central in guiding users where to look. However, most of the findings and discussions are limited to or predominantly focused on the English-language context(Battelle, 2005; Bermejo, 2009; Couvering, 2004, 2008; Dahlberg, 2005; Hargittai, 2007; Segev, 2008), and little effort has been made to understand whether such a phenomenon is specific to Google/Wikipedia or can be found for other major search engines and user-generated encyclopedias. In addition, the multi-lingual internet and the rise of non-English users on the Web have multiple implications on the “localization” effects on search engines. Localization (thereafter L10n), a process of adapting computer software or information systems for a group of users usually defined by national boundaries or geo-linguistic profiles(Hussain & Mohan, 2008; Liao, 2011; McKenna & Naftulin, 2000), is expected to influence users’ information-seeking practices. Both Google and Wikipedia provide localized content and interfaces designed to serve different group of users. . Because Google (or other general-purpose search engines), Wikipedia (or other user-generated encyclopedias) and localization are likely to present and thus frame the Web differently for different groups of users, they effectively filter information for them. While such filtering can be described as gatekeeping by communication scholars, the fact that the Web users can directly or indirectly participate in such information filtering processes has introduced techniques and theories of "collaborative filtering" (Benkler, 2006; Goldberg, Nichols, Oki, & Terry, 1992) and “network gatekeeping”(Barzilai-Nahon, 2008). Indeed, while Google and Only a prior version of the abstract above was included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The text below is a work-in-progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research. WikiSym '13 August 05 - 07 2013, Hong Kong, China Copyright 2013 ACM 978-1-4503-1852-5/13/08 ...$15.00.
  2. 2. 2 Wikipedia may concentrate Web traffic and command user attention as major global websites, users’ contribution of web content and links may also influence such filtering and gatekeeping outcomes, as demonstrated by the case of Google query of “Jew”(Bar Ilan, 2006)­ : some users were organized to help the Wikipedia’s entry page of “Jew” to rank higher in the Google’s English-language SERPs. Thus, although both "collaborative filtering" (Benkler, 2006; Goldberg et al., 1992) and “network gatekeeping”(Barzilai-Nahon, 2008) are indeed about filtering and keeping information, the possibility of participation by user input makes the different from the filtering and gatekeeping processes in traditional media. Nonetheless, I argue that geographic and linguistic factors may bound or limit such collaborative and networking possibilities and thus re-introducing national and/or linguistic boundaries back on the Web. Indeed, as early as in the early 2000s, researchers such as Zittrain and Sunstein have raised the issues of localized search results in filtering political content or fragmenting public sphere (Morris & Ogan, 2002; Sunstein, 2002). For SERPs, the question of information control and linguistic boundaries remains, while the “borders” of national framework have been reintroduced in many aspects of technological and legal arrangements(University & School, 2006). In particular, Google’s first collaboration with (or accommodation of) Chinese government’s need and later exit from mainland China has demonstrated the intricate political and cultural dimensions of “localization” of search engine services(Vaughan & Zhang, 2007; Einhorn, 2010). Thus, the research gap on the effects of localization on SERPs and non-English Wikipedia need to be filled, including prominent cases of Chinese-language and Arabic- language internet users whose recent presence and participation in the new internet world has also attracted much attention (Dutta, Dutton, & Law, 2011). In particular, in order to answer how search engines and/or user-generated encyclopedias reintroduce or shape the national or social boundaries, more empirical work on L10n effects is needed (Aragón, Kaltenbrunner, Laniado, & Volkovich, 2012; Bao et al., 2012; Hecht & Gergle, 2010; Liao, 2008, 2011; Luyt, Goh, & Lee, 2009; Massa & Scrinzi, 2012; Mazieres & Huron, 2013; Petzold, Liao, Hartley, & Potts, 2012; Rogers & Sendijarevic, 2012; Warncke-Wang, Uduwage, Dong, & Riedl, 2012). L10n is also briefly discussed as contributing factor to “internationalization mechanisms” of “network gatekeeping”(Barzilai-Nahon, 2008), holding the key for researchers to understand the nationalization or internationalization dynamics of the Web. For Chinese-language internet, there are many localized versions provided several major search engines, including examples such as Yahoo China, Google Hong Kong, Google Taiwan, etc. I call them search engine-locale variants (thereafter search engine variants). Do different search engine variants guide users from various Chinese-speaking regions to see the same websites regardless of which search engine they chose? Or do they see divergent SERP? Prior empirical research has been conducted in analysing SERPs inside mainland China, with the latest research on 316 search query phrases of “Internet event” collected in 2009, indicating that indeed Baidu Baike and Chinese Wikipedia has ranked high among the SERPs (Jiang & Akhtar, 2011). However, it focuses on (and thus is limited to) simplified Chinese users in mainland China and the selected sample of search queries was based upon internet incidents that are politically controversial to mainland China. This paper contributes findings based 2500 search queries in 2011, covering not only more topics but also more Chinese-language search engines across more regions such as Hong Kong, Taiwan and Singapore. Before presenting the methods and findings, the next section will first provide a theoretical framework that captures the localization effects of search engines. 2. L10N OF SEARCH ENGINES Observing how search engines categorise users is one of the practical ways to examine the impact of search engines on national and/or regional boundaries. As part of the industry practice in internationalization/Localization (i18n/L10n), search engines provide different interfaces and services for different users, usually categorized by their geo-linguistic identifiers, using language codes such as zh-TW (Chinese in Taiwan), pt-BR (Portuguese in Brazil), and en-IN (English in India)(DePalma, 2002; Dunne, 2006). These identifiers in turn influence how content is aggregated, filtered and prioritised for users who share the same or similar language preferences. Online users and audiences are often partitioned accordingly by search engine marketing tools such as Google AdWords and Microsoft adCenter. Unlike the globalized TV industry where broadcasting and cable TV are still bounded to geography, these geo-linguistic codes are configurable. For example, one can manage to use UK version of Google even when not in UK To conceptualize the localization effects of search engines, this paper applies the “network gatekeeping” theory (Barzilai-Nahon, 2008) for the following reasons. First, localization was discussed as contributing factor to “internationalization mechanisms” of “network gatekeeping”(Barzilai-Nahon, 2008). Albeit the theory comes mainly from information science to better understand information control in network settings, its multidisciplinary aspects (Jucquois-Delpierre, 2007) can help researchers understand how seemingly technical arrangement of computer software or information system can have enormous effects on gatekeeping or controlling the flows and presentation of information. Second, distinct from traditional gatekeeping theory that focuses on withholding or deletion of information, the network gatekeeping theory not only conceptualizes localization as part of the gatekeeping processes, but also emphasizes the “display” bases for such processes: “Presenting information in a particular visual form designed to catch the eye” (Barzilai-Nahon, 2008). Indeed, search engines visually present the results. Thus, to understand the localization effects of search engines, a data collection method must consider not only the localization parameters but also the visual display of search results. I argue that locales in computing, a set of parameters that describes user’s language, region and other interface preferences, constitute one of the most important online “situations” for online media. By “situations” I use the definition used by medium theorists in the tradition of media ecology: “situations as (social) information- systems that set the patterns of access to information” (Meyrowitz, 1986, 1994). Note that as medium theorists focus on medium rather on messages, the definition is particular suitable for studying search engines because some major companies including Google have resisted the idea that they are in the content or media industry by insisting that they are information companies. For media and communication scholars, the underlying question is less about Google’s industrial identity but rather about how online media in general can use locales to segment, fragment and integrate different media markets and/or audiences by using different information system settings. Thus, geographic and linguistic factors seem to “set the patterns of access to information”, as geo-linguistic situations are expected to determine which websites will be the most visible and constantly appearing ones in the SERPs.
  3. 3. 3 2.1 A Straight-forward Visibility Test Because users often browse SERPs from the top to the bottom, various market research(Enquiro, 2007), social science research (Bar Ilan, 2006; Dunleavy, Margetts, Bastow, Pearce, & Tinkler,­ 2007; Margetts & Escher, 2006; Vaughan & Thelwall, 2004) and industry practices (Slingshot SEO, 2011) has measured the level of online visibility based on webometric data such as the positions in SERPs (more visible if more high up) and/or the number of in- coming web links by other websites. These measurements provide the foundations for keyword search advertising (Brettel & Spilker- Attig, 2010; Chen, 2008; B. J. Jansen, Brown, & Resnick, 2007; B. J. Jansen & Mullen, 2008; J. Jansen, 2011; Jung, 2008; Malaga, 2008; Spindler, 2010). For marketing purposes, it is imperative to boost the ranking of a website for a target set of search terms (or search keywords). For the purpose of this research, the focus shifts to the medium role of search engines between users and webpages. As shown in Figure 1, search engines play the gatekeeping role by curating different sets of web pages for different group of users characterized by their respective search engine variants. It functions as “network” gatekeeping because search engines often provide different rankings based on both user data and the inter- linking data among the web pages themselves. Figure 1. Search engines as the “network gatekeeper” between users and web pages To account for the difference made by the ranking positions in SERPs, this research proposes a method to quantify such “display” gatekeeping differences(Barzilai-Nahon, 2008). Because different SERP rankings suggest different level of visibility, different scores can be assigned. One way to do so is use click-through rate (thereafter CTR) data for SERPs. Commonly used in online advertising, CTR measure the number of clicks on a web link divided by the number of times it is shown to the users (i.e. clicks/impressions). For search engine marketing, CTR indicate the probability of a listed web link being clicked. Based on the arithmetic mean of the CTR for top-10 search results from five different sources (Hearne, 2006; Jones, 2007; Young, 2011), I plotted the scatter chart in Figure 2 to show the relationship between the SERP ranking and CTR. The top-ranking website is expected to receive more than 30% of the traffic while the second receives just a bit over 10%, and so on. The relationship between the SERP ranking and CTR seem to follow the power function of y = axb . Thus a power regression analysis is done to provide a curve-fitting function of y = 0.2889x-1.078 , with high R² value (0.9934), suggesting a close fit. Thus for this research, the visibility scores are assigned accordingly based on the SERP ranking. Figure 2. Click-through Rates depending on the ranking in the Search Engine Results Page (SERP) While it is impossible to exhaust the SERPs to identify patterns of preferred websites, it has been established by the previous research that the top-10 search results in the first SERP occupy a significant proportion of users’ attention and actual clicks (Hearne, 2006; Jones, 2007; Young, 2011), and based on such estimated data of CTR, different visibility scores can be assigned to websites depending on their ranking in the SERP, as shown in Figure 2. High SERP ranking does not always guarantee users’ actual clicks. Nonetheless, it is justified to use CTR as proxy for visibility scores for the purpose of research: it is the best-effort attempt based on various sources of industry data. 2.2 Chinese Search Engine Markets According to various survey, market and traffic reports from both inside and outside mainland China (CIC, 2009; CNNIC, 2006, 2007; Nguyen, 2011; Russell, 2011; StatCounter, 2011), three major search engines (Baidu, and Google and Yahoo) dominate the search engine markets across four regions (mainland China, Singapore, Hong Kong, and Taiwan) and two Chinese scripts preferences (simplified Chinese for mainland China and Singapore; traditional Chinese for Hong Kong and Taiwan). Thus, nine search engine variants can be derived from the combinations of search engine providers and geo-linguistic preferences, which altogether cover over 97% of the market::  For mainland China (mostly simplified Chinese users): zh-cn: Baidu, Google (simplified Chinese), Yahoo China  For Singapore (mostly simplified Chinese users): zh-sg:Google Singapore and Yahoo Singapore  For Hong Kong (mostly traditional Chinese users): zh-hk:Google Hong Kong and Yahoo Hong Kong  For Taiwan (mostly traditional Chinese users): zh-tw:Google Taiwan and Yahoo Taiwan These variants are hereafter abbreviated as Baidu_CN, Google_CN, Yahoo_CN, Google_SG, Yahoo_SG, Google_HK, Yahoo_HK, Google_TW,Yahoo_TW.It is noted that Baidu continues to enjoy its lead in mainland China with Google at second place, after Google moved its mainland operations to Hong Kong (BBC, 2011). In Hong Kong and Taiwan around 2010 to 2011, Google has overtaken Yahoo’s leading position while maintaining its top position in Singapore (StatCounter, 2011). With all these nine variants, will the SERPs merge on a similar set of websites or diverge? By answering this question, researchers can gain insights on the converging and diverging effects of search engines for Chinese-language users across these regions. Users (often categorized by providers and geo‐linguistic settings) Search Engines Web pages y = 0.2889x-1.078 R² = 0.9934 0% 5% 10% 15% 20% 25% 30% 35% 1 2 3 4 5 6 7 8 9 10 VisibilityScores Ranking of the Search Engine Results Page wighted by CTR unweighted Power (wighted by CTR)
  4. 4. 4 2.3 Merging and diverging effects of SERPs If the aforementioned market survey and traffic reports are correct, search engine users from Taiwan mostly filter web pages through the lens of search engine variants of Google_TW and Yahoo_TW. ThosefromHongKongmostlyuseGoogle_HKandYahoo_HK,andsoon. By conceptualizing search engines as medium, the merging and diverging patterns of SERPs will also indicate whether users from these regions will see similar websites, using different search engine providers. Hence, the SERP data may indicate patterns which search engines may overcome offline boundaries across these regions (if the SERPs converge on specific websites) and which may reinforce them (if the SERPs diverge), thereby contributing to the general question of media and globalization on the case of search engines. To do so, the proposed method of visibility tests that quantify the top-ranking websites can be used as indication of search engines exercising its “display” gatekeeping power for certain websites. Based on the quantified numbers of such display gatekeeping power, the visibility patterns can be systematically examined between (1) search engine variants and (2) visible websites. Moreover, visibility scores can be further aggregated (i.e. summed) over a selection of search queries, so as to better answer different research questions that guide such selection. Ideally, by exhausting visibility scores for various localized versions of SERPs over large sample of search queries, researchers can better compare how visible a website is across different search engine variants, thereby paving the ways for showing the merging and diverging patterns of the SERPs. It should be noted that, borrowing from the academic research on webometric visibility and the industry practice on keyword advertising, the proposed framework and method is general enough for future study regardless the providers and/or geo-linguistic preferences of search engines: For example. How different, or similar, are the SERPs provided by Yandex versus Google in Turkey? How different, or similar, are the SERPs provided by Google Hindi versus Google Urdu in India? The outcome of visibility scores can be further visualized and analysed by various network analysis techniques. Thus, this method will answer these empirical questions, with results that can then be interpreted to explore the cultural political implications of such patterns. To showcase how the integrated method works satisfactorily, I choose to study Chinese-language internet because its boundaries have several historical, cultural and political complications. For example, regions such as mainland China, Singapore, Hong Kong and Taiwan have different practices in democracy, free speech, human rights and Chinese scripts (Damm, 2007; Liao, 2009; Zhao & Baldauf, 2007). 3. DATA Collection To identify how search engine variants influence the Chinese- language SERPs, the top-10 results should provide enough indication. 3.1 Search Queries First, I have selected about 2500 search queries that are relevant to Chinese cultural and political topics. As summarized in Table 1, the selection includes all 990 entries in "The Cambridge Encyclopedia of China"(The Cambridge encyclopedia of China, 1991), the top 10 search terms provided respectively by Baidu and Google (including mainland China, Hong Kong and Taiwan variations) of various categories since 2007, major popular cultural references, notable people names and some other culturally and politically "sensitive" keywords. Although other selection or combination is possible, this selection aims to focus this research on the prominence of user- generated encyclopedias across Chinese-speaking regions. Table 1 Sources and numbers of search queries Second, the sample keywords are transliterated into search queries according to the respective Chinese orthographic preferences (simplified Chinese for mainland China and Singapore; traditional Chinese for Hong Kong and Taiwan), making this research first of its kind to compare SERPs across Chinese-language variants. Third, the top-10 SERPs are collected for the nine search engine variants that cover four major Chinese-speaking regions of China, Singapore, Hong Kong and Taiwan. Then they are parsed and processed by the visibility tests, weighting the high-ranking website with higher visibility scores. 3.2 Search Results Around 22,000 web links are extracted from the SERPs based on the outcome of 2500 search queries submitted across nine variations of search engines in 2011. These 22,000 web links correspond to around 25,000 unique domain names. Then the outcome is further consolidated manually by checking IP addresses to over 16,000 websites (e.g. the website of aggregates and Finally, all education and government websites are aggregated into respective top-level domain names, such as,, and 4. FINDINGS To show how localization influences online visibility, the collected data of visibility scores are unpacked and analysed as follows. 4.1 Concentrated visibility scores Figure 3 shows the respective proportion distribution and accumulative distribution of visibility scores for the top-100 most visible websites. It is evident that near 80% of the visibility scores are concentrated over the top-100 websites, and indeed three user- generated encyclopedia websites ranked highest: (1), (2) and (3) For the website, Chinese Wikipedia ( is the most visible; for, Baidu Baike ( is the most visible. Categories of Search Keywords The Cambridge Encyclopedia of China 990 Top 10 Search Terms (Google and Baidu) 387 Best Film/Popular Music (China, Hong Kong, Taiwan) 364 Modern Concepts (shared with modern Japanese) 171 Notable People 476 Nobel Prize Winners of Chinese origin 11 Major Chinese Politicians 187 Rich People (China, Hong Kong, Taiwan) 82 100 Contemporary Intellectuals (China) 100 Major Fugitives From Taiwan 17 Victims of White Terror in Taiwan 79 Potentially Sensitive Terms 112 Japanese AV porn stars 48 Prosecuted and Sentenced Corrupted Chinese Officials 14 Documented Filtered Words by Great Firewall 50 Total 2500 Numbers
  5. 5. 5 Figure 3. Concentrated visibility scores Since the top-100 most visible websites account for more than 80% of the visibility scores, strong concentration effects are found. Thus, the following sub-section further examines these websites. 4.2 Tabulating visibility scores Table 2 tabulates the top-100 ranking websites, and their respective visibility scores for each search engine variants. Each cell shows the visibility score that a search engine variant has contributed to a particular website. For example, the first cell 34.30 indicates how much Baidu_CN has contributed to Chinese Wikipedia ( Table 2 Top-ranking websites: visibility scores Note that the top three are all user-generated encyclopedia: Chinese Wikipedia, Baidu Baike and Hudong Baike. For another example, the official news website of Falun Gong ( which is ranked at 18th) is completely blocked out from Baidu’s results (i.e. the zero visibility score suggests that it never show up in Baidu’s SERPs). It is in direct contrast, say for Yahoo_HK in third last column, where it enjoys visibility score higher than all other mainland-based website including Chinese official media People’s Daily ( which is ranked at 15th), suggesting that the Falun Gong news website perform better even than People’s Daily for Yahoo Hong Kong. Therefore, Table 2 shows in detail which search engine variants favour which websites by citing and showing them more often and prominently in SERPs, rendering them easier to be found (at least for the selection of the search queries). The top-ranking websites include major China-based portals (e.g.,,, and, US-based websites (e.g.,, mainland China-based news media websites (e.g.,, and the aggregated category of mainland Chinese government websites (i.e. Table 2 orders the websites from the most visible one at the top row to the least visible at the bottom row, while the order of search engine variants is decided firstly by search engine providers (from Baidu, Google to Yahoo) then secondly by region (from CN, HK, SG to TW). It is relatively difficult, however, to see any pattern right away from Table 2 as it is tabulated. In other words, although each cell in the table shows the specific level of propensity that a search engine variant prefers a certain website in their SERPs, the table as a whole fails to show in a clear way the overall propensity of which "group" of search engine variants favours which "set" of websites. To identify patterns of converging and diverging, I will use blockmodeling analysis in the next subsection to study the visibility scores in Table 2, each of which represents the strength of ties between search engines and websites. To avoid arbitrary clustering results produced by less-consequential websites collected in the SERPs, only the top-100 most visible websites are considered for analysis. 4.3 Clustering using blockmodeling analysis Cluster analysis is commonly used for exploratory data mining to find how different data points can be grouped based on some statistical data analysis of similarities and differences. To find how “birds of a feather flock together” for the websites and search engine variants at hand, various clustering techniques can be applied, including the agglomerative hierarchical clustering analysis that produce a family tree that details how each data points can be grouped. Nonetheless, this study chooses blockmodeling analysis (Doreian, Batagelj, & Ferligoj, 2004) for the following reasons. First, a blockmodel analysis will produce simplified outcome that suits better for the research question at hand: to identify the rough patterns, without the need to see how specific details on which website is closer to another. Second, as to be shown later, a blockmodel analysis can greatly simplify a complex dataset to provide succinct summarization of the overall structure. Third, as researchers can and must design a blockmodel for data points to fit, a blockmodel analysis is particularly useful to identify converging and diverging patterns. It also provides a systematic way to see how the data points fit the model or not. Fourth, a blockmodel can be seen as a simplified network, and thus it can help to produce a simplified visualization of network data. It should be noted that the dataset can be seen as a two-mode network: Different “nodes” of search engine variants giving different visibility scores to different “nodes” of websites. It is thus equivalent to a network of visibility scores. High visibility scores indicate strong “relationship”. It is an example of two-mode network because there are two types of nodes (i.e. search engine variants and websites) and the relationship between the nodes is limited between the two types of nodes (i.e. the visibility score contributed by one search engine variant to one website). 4.3.1 A blockmodel design Before detailing how the cluster outcome helps identify the merging and diverging patterns systematically, it is necessary to explain the basis on which I design the blockmodel in Table 3. To build a blockmodel, researchers have to make design decisions on g g g g 0% 10% 20% 30% 40% 50% 60% 70% 80% 0 20 40 60 80 100 Accumulative Proportion Rank- ing Websites (Aggregated) Baidu _CN Google _CN Google _HK Google _SG Google _TW Yahoo _CN Yahoo _HK Yahoo _SG Yahoo _TW 1 34.30 272.37 611.39 304.15 586.50 24.46 833.95 254.00 721.01 2 661.93 410.28 174.04 433.81 125.52 72.44 39.10 508.05 4.88 3 5.30 107.93 71.29 107.92 57.31 267.17 2.54 168.23 0.35 4 385.80 51.36 13.29 53.21 9.93 20.52 7.17 102.80 1.65 5 59.18 76.85 21.69 69.33 16.63 41.70 2.04 35.29 0.68 6 0.10 0.03 0.29 0.36 93.46 20.33 140.07 7 0.46 5.14 21.14 7.21 64.29 0.06 30.61 21.07 102.98 8 40.27 41.23 13.00 37.26 11.64 57.85 2.07 23.35 0.95 9 0.29 8.39 66.03 9.04 68.63 45.20 4.96 19.00 10 25.46 38.94 20.30 32.29 15.61 43.03 5.29 34.84 3.57 11 20.89 32.82 10.08 27.34 8.08 38.97 3.18 22.11 1.57 12 25.59 34.68 10.78 31.51 10.00 32.31 2.52 14.56 0.87 13 0.29 1.93 8.96 2.26 19.00 88.33 8.31 33.61 14 42.04 29.12 10.32 19.34 8.41 36.38 1.03 15.31 0.64 15 14.54 23.19 16.00 23.82 18.14 20.97 17.81 11.43 13.39 16 21.73 28.47 15.41 26.79 13.95 9.75 4.27 33.78 2.53 17 26.13 27.18 21.02 27.71 20.06 11.50 1.70 19.31 0.40 18 1.05 27.34 2.23 33.05 34.57 3.93 36.62 19 25.67 25.13 11.86 24.39 9.67 16.70 4.20 10.12 2.56 20 11.08 7.60 1.31 5.93 1.05 29.16 0.29 63.30 0.04
  6. 6. 6 the “connection types” (e.g. “complete” versus “null”) and the number of blocks. A block is said to be “complete” if all cells in that block indicate strong relationship and a block is said to be “null” if all cells in that block contain only weak or none relationship. Thus the three by three blockmodel in Table 3 assumes the data points will fit into nine blocks. For this study, nine search engines will be divided into three groups, and the top-100 websites will be categorized into three sets of websites. Table 3 Expected outcome of blockmodeling   The rationale behind this model is to identify converging and diverging patterns. The second part of the Table 3 shows how three groups of search engine variants (Cluster A, B and C) may converge or diverge on different sets of websites (Cluster X, Y and Z). Thus, I assume a middle ground of websites exist: for all search engine variants, there will be a set of websites that are all visible (i.e. Cluster Y). That is, Cluster A, B and C converge on Cluster Y with high visibility scores, indicated by the dark blocks containing strong ties (i.e. high visibility scores). To account for any deviation from the "converging" middle ground, I expect two blocks of low- visibility cells (i.e. weak or none relationship), as represented by two white cells in Table 3): one at the top-left and another at the bottom-right. Both blocks thus indicate the patterns of divergence, or lack of convergence. For this study, if all search engine variants converge on the same top visible websites, then there should be no patterns of divergence. Using this scenario of complete convergence as the null hypothesis (no difference in visibility patterns), I expect some evidence of diverging effects to reject the null hypothesis. If there is a significant number of websites in the low-visibility blocks (one at upper-left and another at lower-right corner), then the diverging patterns are identified accordingly. 4.3.2 Patterns of merging and diverging Using the blockmodeling function provided by a social network analysis tool called Pajek, the 9 by 100 cells of strong versus weak ties are simplified into the three-by-three blockmodel, as shown in Table 4. For each cell, the color represents strong (dark) or weak (white) ties, and these cells are roughly partitioned into three-by- three blocks, thereby effectively clustering the nine search engine variants into three groups and the 100 most visible websites into three sets. It is not a perfect match, and there are 87 cells out of 900 (9.67%) that does not match the designed block model. Given the space limitation, only the top-20 websites in full. As shown in Table 4, for the top 100 websites, 39 of them are categorized into the first cluster of websites (Cluster X), 13 to Cluster Y and 49 to Cluster Z. If we look at the top-20 most visible websites only, the converging set of websites (Cluster Y) is thin (only one website). This website ( belongs to the Chinese official party organ media People’s Daily. Table 4 Blockmodeling outcome weak strong strong strong strong strong strong strong weak Rank- ing Websites (Aggregated) Baidu_ CN Yahoo_ CN Google _CN Yahoo_ SG Google _SG Google _TW Google _HK Yahoo_ HK Yahoo_ TW 1 34.30 24.46 272.37 254.00 304.15 586.50 611.39 833.95 721.01 6 0.10 0.00 0.03 20.33 0.00 0.36 0.29 93.46 140.07 7 0.46 0.06 5.14 21.07 7.21 64.29 21.14 30.61 102.98 9 0.29 0.00 8.39 4.96 9.04 68.63 66.03 45.20 19.00 13 0.29 0.00 1.93 8.31 2.26 19.00 8.96 88.33 33.61 18 0.00 0.00 1.05 3.93 2.23 33.05 27.34 34.57 36.62 … and other 33 websites (The total number of websites is 39 for this block) 15 14.54 20.97 23.19 11.43 23.82 18.14 16.00 17.81 13.39 … and other 12 websites (The total number of websites is 13 for this block) 2 661.93 72.44 410.28 508.05 433.81 125.52 174.04 39.10 4.88 3 5.30 267.17 107.93 168.23 107.92 57.31 71.29 2.54 0.35 4 385.80 20.52 51.36 102.80 53.21 9.93 13.29 7.17 1.65 5 59.18 41.70 76.85 35.29 69.33 16.63 21.69 2.04 0.68 8 40.27 57.85 41.23 23.35 37.26 11.64 13.00 2.07 0.95 10 25.46 43.03 38.94 34.84 32.29 15.61 20.30 5.29 3.57 11 20.89 38.97 32.82 22.11 27.34 8.08 10.08 3.18 1.57 12 25.59 32.31 34.68 14.56 31.51 10.00 10.78 2.52 0.87 14 42.04 36.38 29.12 15.31 19.34 8.41 10.32 1.03 0.64 16 21.73 9.75 28.47 33.78 26.79 13.95 15.41 4.27 2.53 17 26.13 11.50 27.18 19.31 27.71 20.06 21.02 1.70 0.40 19 25.67 16.70 25.13 10.12 24.39 9.67 11.86 4.20 2.56 20 11.08 29.16 7.60 63.30 5.93 1.05 1.31 0.29 0.04 … and other 35 websites (The total number of websites is 48 for this block) relatively strong versus weak: vs blockmodel: strong weak This blockmodeling findings also help identify the merging and diverging patterns of search engine variants. Cluster A contains Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C contains Yahoo_HK and Yahoo_TW. The cluster outcome shown in Table 5 indicates both patterns of merging and diverging, determined by the choice of search engine variants. For the three groups of search engine variants, two groups of search engine variants deviate from the rest. The first group (Cluster A) contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), and the second group (Cluster C) contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). Thus, while the search engine variants in Cluster B produce converging results for the top-100 websites, with “complete” connection types to all clusters of websites, those in Cluster A and those in Cluster C lead to diverging SERP. Table 5 Clusters identified by blockmodeling 4.4 Visualizing and unpacking findings To show the results of visibility scores in a more intuitive manner, a network visualization graph of the top-800 most visible websites is shown in Figure 4. I visualize the nine search engine variants (shown as the text boxes at the peripheral) and 800 most visible websites (shown as nodes in the middle). Thus, the two-mode network is presented in a way to indicate the overall likelihood for a given search engine variant to recommend a website shown in the middle. Pointing only from one node of search engine variant to one node of website, each arrow represents a total visibility score Cluster A Cluster B Cluster C Cluster X complete complete Cluster Y complete complete complete Cluster Z complete complete Cluster A Cluster B Cluster C Cluster X Cluster Y Cluster Z converging converging converging Cluster A Cluster B Cluster C Baidu_CN Google_HK Yahoo_HK Google_CN Google_SG Yahoo_TW Yahoo_CN Google_TW Websites # Yahoo_SG Cluster X 39 complete complete Cluster Y 13 complete complete complete Cluster Z 48 complete complete
  7. 7. 7 contributed by a search engine variant to a website, with its arrow width proportional to the values of visibility scores: Wider arrows indicate higher visibility scores . Similarly, the area size of a node is proportional to the sum of visibility scores a website receive from all search engine variants, allowing easy comparison on which websites are more visible. Note that the visibility scores are distributed quiet unevenly and thus only the top 20 are marked with their respective ranking numbers. User-generated encyclopedias are the most visible websites (node 1: Chinese Wikipedia , node 2: Baidu Baike, node 3: Hudong). For another, Chinese Wikipedia(1) is highly visible to almost all variations except Yahoo_CN and Baidu_CN, while Baidu Baike(2) highly visible in Baidu_CN, Google_CN, Google_SG, and moderately so in Google_HK. Based on the previous clustering results, two red dash lines are also drawn in Figure 4, roughly indicating three areas. Positioned in the middle are the search engine variants in Cluster B, because of their converging patterns on strong ties with most websites. The two red dash lines also show the search engine variants in Cluster A to the left and those in Cluster C to its right, indicating diverging effects because of the presence of weak ties. This explains why Cluster A and Cluster C is shown adjacent to Cluster B, but not adjacent to each other. This visualization is thus consistent with the findings shown in Table 5. This blockmodeling findings also help identify the merging and diverging patterns of search engine variants. Cluster A contains Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C contains Yahoo_HK and Yahoo_TW. The cluster outcome shown in Table 5 indicates both patterns of merging and diverging, determined by the choice of search engine variants. For the three groups of search engine variants, two groups of search engine variants deviate from the rest. The first group (Cluster A) contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), and the second group (Cluster C) contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). Thus, while the search engine variants in Cluster B produce converging results for the top-100 websites, with “complete” connection types to all clusters of websites, those in Cluster A and those in Cluster C lead to diverging SERP. The findings can also be unpacked depending the specific search engine variant. Based on the same method, an additional 500 Chinese names of the Fortune 500 companies are added to the selection of 2500 search queries, producing a second dataset in 2012 (Liao, 2013a). The following paragraphs unpack this second dataset for two search engine variants in mainland China: Google_CN (see Table 6) and Baidu_CN (see Table 7). The results for the top-20 websites for each categories of search queries of Google_CN, as shown in Table 6, show that rank the top in almost all categories. is close second here for Google_CN, suggesting a general observation that search engines favour user-generated encyclopedias. The particular findings also provide some counter evidence against the idea that Google as a specific comapny favour Wikipedia as a website because Google_CN actually favours Baidu Baike more than Chinese Wikipedia, as clearly shown in Table 6. The findings of Baidu_CN in Table 7 shows even more dominance by Baidu Baike: It dominates all of seven categories with the proportion of visibility scores is comparatively much concentrated when compared to the results of Google_CN (see Table 6). In addition, when considering the ranking position of, the findings seem to confirm the unfair competition accusation made by Hudong’s CEO against Baidu (Yang, 2011). Depending on the types of search quries, is ranked by Google_CN from 3rd to 9th (see Table 6). In contrast, Hudong Baike is not even among the top-20 for many categories of the sampled queries for Baidu Search. Indeed, if Google’s SERP can serve as an independent third party for the competition between Baidu Baike Figure 4. Delineating the boundaries of geo-linguistic settings based on SERPs. Rank- ing Websites (Aggregated) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  8. 8. 8 and Hudong, Google does not make Hudong almost invisible as Baidu does. Hence if users from mainland China use Google Search instead of Baidu Search, then Chinese Wikipedia will become equally visible as Baidu Baike for them. 5. DISCUSSION By systematically analysing the SERPs collected across four major Chinese-speaking regions, it is shown that the patterns of merging and diverging do exist. It is achieved by calculating visibility scores as the equivalent “social ties” between search engine variants on one hand and top-ranking websites on the other. Both the network visualization and the blockmodeling outcomes show that the geo- linguistic factors do make Chinese-language SERPs diverge on certain websites, while converging on another. In particular, of the nine search engine variants, the first group that diverges from the rest contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), The second group contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). The findings suggest that the major online boundary in Chinese Internet is drawn first along the line of regional difference, with all mainland Chinese search engine settings share similar SERPs among themselves, but not with the others to the same degree, as shown in Figure 4. Another boundary is drawn for Yahoo Taiwan and Yahoo Hong Kong at the other end. It is relatively easy to explain the latter results because Yahoo Search by default prioritizes local content, with other geo-linguistic variant options available for users listed in the web interface: e.g. “search the traditional Chinese-character-written web pages” or “search the global websites”. In contrast, it is relatively difficult to provide just technical explanations regarding the question why all three mainland Chinese settings do not share that much with other settings in terms of the corresponding SERPs. It is likely that many of the websites that are absent from the SERPs in three mainland Chinese settings include those are not politically welcome in mainland China. Note that the first two columns in Table 5 represent Baidu_CN and Yahoo_CN, both of which constantly have weak ties with several of the top 100 websites. The two search engine variants also represent the only two that filter SERPs for users in mainland China. Note also that the third column in Table 5 represents Google_CN. While it is clustered with Baidu_CN and Yahoo_CN, it has more strong ties with the top 100 websites, suggesting it has less divergent results. The findings seems to suggest that users from mainland China, if using only Baidu_CN and Yahoo_CN, will have a substantial number of otherwise highly visible websites overlooked or even missing from their daily search experiences. These include websites such as YouTube and Facebook that have been reported being blocked by mainland China. They also include the websites of government and education institutions in Taiwan and Hong Kong:, and In other words, the Table 6 Results for Google_CN Ranking 1 47.65% 25.08% 36.44% 37.28% 28.98% 27.89% 27.99% 2 25.36% 12.94% 15.33% 24.13% 26.82% 25.14% 16.67% 3 8.74% 12.06% 9.46% 11.00% 7.66% 9.63% 13.65% 4 2.58% 6.67% 5.00% 3.55% 7.10% 7.17% 8.74% 5 2.03% 6.01% 4.45% 3.18% 4.81% 3.59% 4.09% 6 1.33% 5.86% 3.60% 2.66% 4.03% 3.34% 3.79% 7 1.30% 4.27% 3.33% 2.64% 3.39% 2.95% 3.62% 8 1.13% 4.26% 3.14% 1.61% 3.30% 2.74% 3.59% 9 1.07% 3.29% 3.09% 1.50% 2.93% 2.45% 3.41% 10 1.06% 2.78% 2.14% 1.46% 2.30% 1.94% 3.20% 11 1.04% 2.47% 1.77% 1.44% 1.71% 1.90% 2.43% 12 1.03% 2.31% 1.63% 1.43% 1.55% 1.72% 1.29% 13 0.96% 1.85% 1.58% 1.26% 1.25% 1.50% 1.15% 14 0.84% 1.59% 1.56% 1.05% 0.78% 1.45% 1.12% 15 0.83% 1.57% 1.50% 1.04% 0.62% 1.25% 1.11% 16 0.73% 1.51% 1.39% 1.02% 0.60% 1.12% 0.96% 17 0.63% 1.45% 1.35% 1.00% 0.58% 1.03% 0.93% 18 0.60% 1.43% 1.27% 0.97% 0.55% 0.89% 0.88% 19 0.54% 1.32% 1.07% 0.95% 0.53% 0.81% 0.72% 20 0.54% 1.27% 0.91% 0.83% 0.52% 0.76% 0.66% Fortune500 The Cambridge Encyclopedia of China Top 10 Search Terms (Google and Baidu) Best Film/Popular Music (China, Hong Kong, Taiwan) Modern Concepts (shared with modern Japanese) Notable People Potentially sensitive terms Table 7 Results for Baidu_CN Ranking 1 75.74% 64.17% 73.28% 81.56% 57.53% 69.54% 61.90% 2 6.20% 4.79% 6.66% 2.41% 7.48% 5.30% 7.62% 3 1.98% 4.59% 2.57% 2.16% 6.12% 3.38% 7.13% 4 1.94% 4.13% 2.30% 2.05% 5.00% 3.23% 3.20% 5 1.86% 3.05% 1.91% 1.59% 2.82% 2.17% 2.27% 6 1.64% 2.73% 1.65% 1.14% 2.52% 1.73% 1.91% 7 1.61% 2.32% 1.61% 1.10% 2.46% 1.68% 1.73% 8 1.18% 1.91% 1.55% 0.89% 2.31% 1.50% 1.73% 9 1.13% 1.53% 1.48% 0.80% 1.84% 1.47% 1.60% 10 0.89% 1.28% 1.07% 0.78% 1.68% 1.40% 1.53% 11 0.88% 1.24% 0.78% 0.74% 1.52% 1.27% 1.38% 12 0.61% 1.19% 0.78% 0.60% 1.40% 1.17% 1.37% 13 0.59% 1.14% 0.73% 0.59% 1.36% 1.04% 1.05% 14 0.58% 0.97% 0.68% 0.59% 1.32% 0.89% 1.04% 15 0.58% 0.97% 0.55% 0.57% 0.97% 0.88% 0.86% 16 0.57% 0.93% 0.53% 0.50% 0.85% 0.73% 0.83% 17 0.52% 0.80% 0.50% 0.49% 0.78% 0.70% 0.74% 18 0.51% 0.80% 0.48% 0.48% 0.73% 0.66% 0.73% 19 0.51% 0.74% 0.48% 0.47% 0.73% 0.65% 0.70% 20 0.50% 0.71% 0.42% 0.46% 0.58% 0.61% 0.68% Fortune500 The Cambridge Encyclopedia of China Top 10 Search Terms (Google and Baidu) Best Film/Popular Music (China, Hong Kong, Taiwan) Modern Concepts (shared with modern Japanese) Notable People Potentially sensitive terms
  9. 9. 9 SERPs of the three mainland Chinese variants seem to diverge from these websites. In contrast, the websites of government and education institutions in mainland China, and, are still relatively visible for almost all other search engine variants except for the by-default-local Yahoo_TW and Yahoo_HK. Thus, the patterns of merging and diverging seem to reflect the cultural political complications of Chinese-language internet. While the offline boundary between Hong Kong and Taiwan seems to be overcome, that between mainland China and Hong Kong seems to be reinforced. Although the SERP data may not reflect perfectly what users actually read and click, it nonetheless indicates a general probabilistic tendency substantiated by industry data. 6. CONCLUSION The findings, visualized and analysed using network analysis techniques, clearly indicate a strong localization effects on the gatekeeping function of search engines, based on data covering over 97% of the search engine market for four Chinese-speaking regions. The findings also show major user-generated encyclopedias such as Baidu Baike and Chinese Wikipedia do dominate the SERPs with high rankings and visibility scores. Because of the geo-linguistic factors coincide with different cultural political situations of these Chinese-speaking regions, different localization variants produce divergent outcomes of high- ranking encyclopedia and other websites, thereby indicating strong effects of “network gatekeeping” by search engines in exercising gatekeeping bases of “display” and “localization”(Barzilai-Nahon, 2008). In addition, by examining the overall patterns of SERPs, I have demonstrated the merging and diverging effects contributed by the factors of search engine providers and regional and language settings. Different combinations of such provider and geo-linguistic information lead to different “search engine variants”. Nine major search engine variants, covering four regions with Chinese- speaking majority population, are identified for the Chinese- language internet. For a selected set of search queries covering major Chinese cultural and political topics, I have found that the SERPs converge on a specific type of websites (i.e. user-generated encyclopedias) and that some search engine variants converge more on Baidu Baike while other on Chinese Wikipedia. The merging and diverging patterns are further analysed by both network visualization and network analysis (blockmodeling analysis of two- mode networks). Different patterns indicate that both “nationalization” of a specific kind (i.e. mainland China) and “trans-nationalization” (i.e. Hong Kong and Taiwan) can be achieved by different gatekeeping options offered by various search engine variants. The results show that the SERPs are more likely to converge based on similar geo-linguistic preferences. For example, the SERPs diverge the most when users choose different Chinese characters (i.e. simplified Chinese versus traditional Chinese). It is then particularly intriguing that all Hong Kong variant results converge more with Taiwanese variant ones and much less so with mainland Chinese variants, while Hong Kong is much closer to mainland China geographically, politically and administratively. In addition, Chinese Wikipedia is much more visible in these regions than in mainland China. Though the findings here cannot further breakdown the geo-linguistic factors from cultural political ones, the converging and diverging patterns alone are important findings for Chinese-internet research and Wikipedia research. There are of course obvious limitations for the findings presented above. First, the selection of search query, while significant larger than previous social scientific research on Chinese-language search engines(Jiang & Akhtar, 2011), is still limited. Second, due to limitation of space, this paper has not yet fully unpacked the different findings for different categories of search queries. Third, only standard Mandarin Chinese terms are used for this research, overlooking other possibilities of written Cantonese queries (Chau, Fang, & Yang, 2007). Forth but not last, only the default setting for each localized search engine is analysed. While the dataset presented may be limited in the scope of selected search queries, time and search engine variants, I have demonstrated the usefulness and viability of examining the merging and diverging patterns because of the search engine variants, each of which correspond to a segment of search engine market. For instance, it can help online linguistics research by analysing different SERP outcome for regions that use a shared writing system but with regional variants, such as the difference between Egyptian Arabic and Maghrebi Arabic. For another example, these geo-linguistic factors can be said to constitute one of the most important online “situations” for online media, as defined by medium theorists in the tradition of media ecology (Meyrowitz, 1986, 1994), because these factors set the patterns of access. According to a statistical report by the Data Center of China Internet, During the first half year of 2010, the content produced by amateur Chinese Internet users have surpassed that produced by professional websites (Liao, 2013b; Qiang, 2010). Thus user- generated content by Chinese Internet users are expected to have influenced user-generated encyclopedias directly and SERP indirectly. While this study has not yet addressed the relationship among search engines, user-generated content and user-generated encyclopedias, the findings here seems to suggest similar geographic and linguistic dynamics. The clear outcome of “network gatekeeping”, identified by Chinese search engine variants and their respective preferred encyclopedias, may point to a larger online context for Chinese Internet users across regions. For future research, it will be useful to examine how geographic and linguistic factors may influence the network gatekeeping processes inside user-generated encyclopedias (Liao, 2009). It is likely that they also exercise the gatekeeping bases of “display” and “localization” as search engines do. The overall method can be systematically extended for other contexts. Various search engine variants can be chosen for research for almost all the other language in the world, including languages with transnational adoption such as Arabic, Hindu, Tamil, English, Spanish, Portuguese, etc. Researchers can thus further interpret the merging and diverging SERP outcome for research questions that are relevant for global, transnational or inter-cultural communications on one hand, and another set of questions for human-computer interaction and information system on the other. Also, the focus on examining geo-linguistic factors as important variables for understanding search engines can contribute to the development of geo-linguistic analysis of the Web (Liao & Petzold, 2011; Petzold & Liao, 2011). It can also be adopted for market and industry applications when geo-linguistic identifiers are central (DePalma, 2002; Dunne, 2006) . In conclusion, the proposed method has the potentials for a wider range of market and academic applications. The theoretical implication may be extended to other websites or information systems that produce or curate different outcome based on geographic and linguistic preferences (or configurations) of users. It highlights the role of geo-linguistic parameters as media “access codes”, or set patterns of access to information as articulated by medium theorists for TV research (Meyrowitz, 1986, 1994), or the
  10. 10. 10 “network gatekeeping process” theorized by new information science theory (Barzilai-Nahon, 2008). Localization has become the new medium that has the (higher-level) messages of cultural political integration, reintegration or fragmentation of users. 7. ACKNOWLEDGMENTS This work was supported by the Taiwan National Science Council’s Taiwan Merit Scholarships Program (NSC-095-SAF-I- 564-028-TMS) and supported in part by the Oxford Internet Institute Scholarship. Special thanks to Ralph Schroeder, Bernie Hogan, Scott Hales and Min Jiang for their advice and support. 8. REFERENCES Aragón, P., Kaltenbrunner, A., Laniado, D., & Volkovich, Y. (2012). Biographical Social Networks on Wikipedia - A cross- cultural study of links that made history. In Proceedings of WikiSym 2012. Retrieved from Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. (2012). Omnipedia: bridging the Wikipedia language gap. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 1075–1084). Retrieved from Bar‐Ilan, J. (2006). Web links and search engine ranking: The case of Google and the query “jew.” Journal of the American Society for Information Science and Technology, 57(12), 1581–1589. doi:10.1002/asi.20404 Barzilai-Nahon, K. (2008). Toward a theory of network gatekeeping: A framework for exploring information control. Journal of the American Society for Information Science and Technology, 59(9), 1493–1512. doi:10.1002/asi.20857 Battelle, J. (2005). The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture (First Edition.). Portfolio Hardcover. BBC. (2011, March 31). Google’s China exit “exaggerated.” BBC. Retrieved from Benkler, Y. (2006). The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven and London: Yale University Press. Retrieved from Bermejo, F. (2009). Audience manufacture in historical perspective: from broadcasting to Google. New Media & Society, 11(1-2), 133 –154. doi:10.1177/1461444808099579 Brettel, M., & Spilker-Attig, A. (2010). Online advertising effectiveness: a cross-cultural comparison. Journal of Research in Interactive Marketing, 4(3), 176–196. doi:10.1108/17505931011070569 Charlton, G. (2012, February 13). Why Wikipedia is top on Google: the SEO truth no-one wants to hear. Econsultancy: Digital Marketers United. Retrieved from google-the-seo-truth-no-one-wants-to- hear?utm_campaign=bloglikes&utm_medium=socialnetwork& utm_source=facebook Chau, M., Fang, X., & Yang, C. C. (2007). Web searching in Chinese: A study of a search engine in Hong Kong. Journal of the American Society for Information Science and Technology, 58(7), 1044–1054. doi:10.1002/asi.20592 Chen, J. (2008). Essays on auction mechanisms and resource allocation in keyword advertising (The University of Texas at Austin). ProQuest. CIC. (2009). China Search Engine Market Report 2009. Beijing, China: China IntelliConsulting Corporation. Retrieved from CNNIC. (2006, September 16). Chinese Search Engine Market Survey Report 2006. China Internet Network Information Center. Retrieved November 19, 2011, from CNNIC. (2007, September 26). 2007 Survey Report on Search Engine Market in China. China Internet Network Information Center. Retrieved November 19, 2011, from CNNIC. (2009, March 5). China Search Engine Report 2008 Advertisers and Users Behavior Study. (中国搜索引擎市场广 告主与用户行为研究报告). Retrieved November 19, 2011, from Couvering, E. V. (2004). New Media? The Political Economy of Internet Search Engines. Presented at the International Association of Media & Communications Researchers, Porto Alegre, Brazil. Retrieved from 1900 Couvering, E. V. (2008). The History of the Internet Search Engine: Navigational Media and the Traffic Commodity. In A. Spink & M. Zimmer (Eds.), Web Search (Vol. 14, pp. 177–206). Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from Čuhalev, J. (2006). Ranking of Wikipedia articles on search engines for searches about its own articles (Seminar Task for Internet Search Techniques and Business Intelligence class) (p. 7). Retrieved from wikipedia-in-your-google-searches/ Dahlberg, L. (2005). The Corporate Colonization of Online Attention and the Marginalization of Critical Communication? Journal of Communication Inquiry, 29(2), 160 –180. doi:10.1177/0196859904272745 Damm, J. (2007). The Internet and the fragmentation of Chinese society. Critical Asian Studies, 39, 273–294. doi:doi:10.1080/14672710701339485 DePalma, D. A. (2002). Internationalization and Localization. In Business without borders: a strategic guide to global marketing. New York: John Wiley and Sons. Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26(1), 29–53. doi:10.1016/j.socnet.2004.01.002 Dunleavy, P., Margetts, H., Bastow, S., Pearce, O., & Tinkler, J. (2007). Government on the internet: progress in delivering information and services online. UK: National Audit Office. Retrieved from 07/0607529.pdf Dunne, K. J. (2006). Perspectives on Localization. John Benjamins Publishing Company. Dutta, S., Dutton, W. H., & Law, G. (2011). The New Internet World: A Global Perspective on Freedom of Expression, Privacy, Trust and Security Online. SSRN eLibrary. Retrieved from Einhorn, B. S., Bruce. (2010, November 11). How Baidu Won China. BusinessWeek: Online Magazine. Retrieved from 060242597_page_6.htm Enquiro. (2007, June 15). Chinese Eye Tracking Study: Baidu Vs Google. Retrieved July 9, 2009, from vs-google-11477
  11. 11. 11 Fallows, D. (2008). Search Engine Use. Pew Research Center’s Internet & American Life Project. Retrieved November 19, 2011, from Engine-Use.aspx Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12), 61–70. doi:10.1145/138859.138867 Gray, M. (2007, May). Google Love Affair with Wikipedia - Graywolf’s SEO Blog. Graywolf’s SEO Blog. Retrieved December 2, 2011, from http://www.wolf- Hargittai, E. (2007). The Social, Political, Economic, and Cultural Dimensions of Search Engines: An Introduction. Journal of Computer‐Mediated Communication, 12(3), 769–777. Hearne, R. (2006, August 12). SERP Click Through Rate of Google Search Results – AOL-data.tgz – Want to Know How Many Clicks The #1 Google Position Gets? Red Cardinal. Retrieved December 2, 2011, from engine-optimisation/12-08-2006/clickthrough-analysis-of-aol- datatgz/ Hecht, B., & Gergle, D. (2010). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the 28th international conference on Human factors in computing systems (pp. 291–300). Retrieved from Hopkins, H. (2009, January 23). Britannica 2.0: Wikipedia Gets 97% of Encyclopedia Visits. Hitwise Intelligence: Analyst Weblog. Retrieved from hopkins/2009/01/britannica_20_wikipedia_gets_9.html Hussain, S., & Mohan, R. (2008). Localization in Asia Pacific. In F. Librero & P. B. Arinto (Eds.), Digital Review of Asia Pacific 2007/2008. Orbicom and the International Development Research Centre (IDRC). Retrieved from IDATE. (2011). World Internet Usage & Markets. IDATE Consulting and Research. Retrieved from Data-Reports_23/World-Internet-Usage-Markets_584.html Jansen, B. J., Brown, A., & Resnick, M. (2007). Factors relating to the decision to click on a sponsored link. Decision Support Systems, 44(1), 46–59. doi:10.1016/j.dss.2007.02.009 Jansen, B. J., & Mullen, T. (2008). Sponsored search: an overview of the concept, history, and technology. Int. J. Electronic Business, 6(2), 114–131. Jansen, J. (2011). Understanding Sponsored Search: Core Elements of Keyword Advertising. Cambridge University Press. Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese Search Engines: A Comparative Study of Baidu, Google, and Goso. Presented at the The 9th Chinese Internet Research Conference (CIRC 2011), Washington, D.C.: Institute for the Study of Diplomacy. Georgetown University. Jones, R. (2007, June 26). 96.6% of Wikipedia Pages Rank in Google’s Top 10. The Google Cache: Search Engine Marketing, SEO & PPC. Retrieved December 2, 2011, from wikipedia-pages-rank-in-googles-top-10/ Jucquois-Delpierre, M. (2007). Fictional reality or real fiction: how can one decide?: The strengths and weaknesses of information science concepts and methods in the media world. Journal of Information, Communication & Ethics in Society, 5(2/3), 235– 252. doi:10.1080/14616700306488 Jung, G. (2008). The Increasing Relevance of Online Marketing. GRIN Verlag. Khanna, A. (2011, October 26). Google drives traffic to Wikipedia, but half of readers look for Wikipedia content — Wikimedia blog. Wikimedia Foundation: Global blog. Official blog. Retrieved from and-wikipedia/ Liao, H.-T. (2008). A webometric comparison of Chinese Wikipedia and Baidu Baike and its implications for understanding the Chinese-speaking Internet. In 9th annual Internet Research Conference: Rethinking Community, Rethinking Place. Copenhagen. Liao, H.-T. (2009). Conflict and Consensus in the Chinese version of Wikipedia. IEEE Technology and Society Magazine, 28(2), 49–56. doi:10.1109/MTS.2009.932799 Liao, H.-T. (2011). Needing to Have a Voice: Linguisitc Grouping in the Digital Networked Environment (ISD Working Papers in New Diplomacy). Washington, D.C.: Institute for the Study of Diplomacy. Georgetown University. Retrieved from %20Voice.pdf Liao, H.-T. (2013a). How does Chinese localization influence online visibility? A study on Chinese-language Search Engine Result Pages (SERPs). (Accepted). To be presented at the 11th Annual Chinese Internet Research Conference (CIRC 2013), Oxford, UK. Liao, H.-T. (2013b). “Online Encyclopedia” (网上/网络百科全书 ), “User Generated Content” (用户生成内容). In (L. Cheng, Ed.)The Internet in China: An Encyclopedic Handbook of Online Business, Information Distribution, and Social Connectivity. Berkshire Publishing. Liao, H.-T., & Petzold, T. (2011). Analysing geo-linguistic dynamics of the World Wide Web: The use of cartograms and network analysis to understand linguistic development in Wikipedia. Cultural Science, 3(2). Luyt, B., Goh, D., & Lee, C. S. (2009). Searching locally: a comparison of Yehey! and Google. Online Information Review, 33(3), 499–510. Malaga, R. A. (2008). Worst practices in search engine optimization. Commun. ACM, 51(12), 147–150. doi:10.1145/1409360.1409388 Margetts, H. Z., & Escher, T. (2006). Governing from the Centre? Comparing the Nodality of Digital Governments. SSRN eLibrary. Retrieved from Massa, P., & Scrinzi, F. (2012). Manypedia: Comparing Language Points of View of Wikipedia Communities. In Proceedings of WikiSym 2012. Retrieved from p13wikisym2012.pdf Mazieres, A., & Huron, S. (2013). Toward Google Borders. Presented at the Web Science. Retrieved from McKenna, M. G., & Naftulin, H. (2000). Challenges in the multicultural HCI development environment. In CHI ’00 extended abstracts on Human factors in computing systems (pp. 362–362). New York, NY, USA: ACM. doi:10.1145/633292.633509 Meyrowitz, J. (1986). No sense of place : the impact of electronic media on social behavior. New York ; Oxford: Oxford University Press. Meyrowitz, J. (1994). Medium theory. In D. Crowley & D. Mitchell (Eds.), Communication Theory Today. Stanford University Press.
  12. 12. 12 Morris, M., & Ogan, C. (2002). The Internet as Mass Medium. In D. McQuail (Ed.), McQuail’s reader in mass communication theory (pp. 134–145). London: SAGE. Nguyen, C. (2011, March). Search Engine Market share by country. Chandler Nguyen Digital Marketing Blog. Retrieved December 1, 2011, from engine-market-share-by-country-mar-2011.html Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000 Percent In Five Years, Driven By Search. New York: Nielsen Online. Retrieved from from-Google-85703.shtml Petzold, T., & Liao, H.-T. (2011). Geo-linguistic analysis of the World Wide Web: The use of cartograms and network analysis to understand linguistic development in Wikipedia. In D. Araya, Y. Breindl, & T. J. Houghton (Eds.), Nexus: New Intersections in Internet Research (pp. 55–75). New York: Peter Lang. Petzold, T., Liao, H.-T., Hartley, J., & Potts, J. (2012). A world map of knowledge in the making: Wikipedia’s inter-language linkage as a dependency explorer of global knowledge accumulation. Leonardo: Art, Science and Technology, 45(3), 284–284. doi:10.1162/LEON_a_00376 PricewaterhouseCoopers. (2011). IAB Internet Advertising Revenue Report. New York; DC: The Interactive Advertising Bureau. Retrieved from Qiang, X. (2010, July 23). User-generated content online now 50.7% of total. China Daily. Beijing. Retrieved from 07/23/content_11042851.htm Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of View? A Comparison of Srebrenica articles across Wikipedia’s language versions. In Wikipedia Academy: Research and Free Knowledge (#wpac2012). Berlin. Retrieved from http://wikipedia- mina_Sendijarevic.pdf Russell, J. (2011). Why Yahoo! –not Google– rules Taiwan’s webspace. Asian Correspondent. Retrieved December 1, 2011, from where-yahoo-not-google-rules-the-countrys-webspace/ Segev, E. (2008). Search Engines and Power: A Politics of Online (Mis-) Information. text. Retrieved November 19, 2011, from SEMPO. (2011). SEMPO State of Search Marketing Report 2011. SEMPO Institute. Retrieved from Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of Google UK for 99% of searches. Intelligent Positioning Blog. Retrieved from -page-one-of-google-uk-for-99-of-searches/ Slingshot SEO. (2011). Google & Bing Click-Through Rates (White paper). Retrieved from ctr-study/ Spindler, S. (2010). Online Marketing: How to Increase International Sales with Search Engine Optimisation. GRIN Verlag. StatCounter. (2011). Top 5 Search Engines in China/Hong Kong/Singapore/Taiwan from Nov 2010 to Nov 2011. StatCounter Global Stats. Retrieved December 1, 2011, from 201011-201111 Sunstein, C. R. (2002). Fragmentation and Cybercascades. In Republic.Com. Princeton University Press. The Cambridge encyclopedia of China. (1991) (2nd ed.). Cambridge [England] ; New York: Cambridge University Press. University, J. G. H. L. S. P. of L. H., & School, T. W. P. of L. C. L. (2006). Who Controls the Internet? : Illusions of a Borderless World: Illusions of a Borderless World. Oxford University Press. Varian, H. R. (2007). The Economics of Internet Search. Presented at the Angelo Costa lecture, Rome. Retrieved from lecture.pdf Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: evidence and possible causes. Information Processing & Management, 40(4), 693–707. Vaughan, L., & Zhang, Y. (2007). Equal Representation by Search Engines? A Comparison of Websites across Countries and Domains. Journal of Computer-Mediated Communication, 12(3). Retrieved from Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network. Retrieved from Yang, Y. (2011, February 25). China’s “Wikipedia” Submits Complaint about Baidu. Economic Observer News, 508, 28. Young, R. D. (2011, August 10). Top Google Ranking Captures 18.2% of Clicks. Search Engine Watch (#SEW). Retrieved December 2, 2011, from Ranking-Captures-18.2-of-Clicks-Study Zhao, S., & Baldauf, R. B. J. (2007). Planning Chinese Characters: Reaction, Evolution or Revolution? Springer.