Mapping social, political, and scientific landscape using webometrics method Asso. Prof. Han Woo PARK Department of Media & Communication YeungNam University 214-1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do 712-749 Republic of Korea [email_address] http://www.hanpark.net http://english-webometrics.yu.ac.kr http://asia-triplehelix.org Thanks to my colleagues and students at the WWI. Virtual Knowledge Studio (VKS)
Invited speech, Department of Media & Communication, City University of Hong Kong, 29 March 2010
( Topic: Mapping social, political, and scientific landscape using webometric method )
development of webometrics tools to automate social Internet research process (e.g., data collection and analysis from search engines, SNS and microblogging sites)
2. experimentation with new types of data visualization across period and platform (e.g, dynamic mappings using HNA)
Webometrics in terms of e-research A minor but growing approach to the study of Internet-mediated communication A new methodological perspective based on the use of new digital tools available online for conducting humanities and social science Internet research
Return data from web in a suitable form to import into Excel, SPSS, LexiURL, etc
Returned data will contain all values, only some of these may be relevant for the current query however having all of the data will ensure that you can revisit later if another project requires more variables
All programs have time-rests, though these vary depending on the service being accessed.
Webonaver (Webometrics Tool for Naver) (Image Source: Newsweek, 5 Nov 2007) WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
“ Korea’s Naver is now the world’s 5th search service provider, behind Google, Yahoo, Baidu and Microsoft.” (The AP, 9 Oct 2007)
“ Google left behind as Koreans Naver-gate the internet ” (Financial Times, 2 Jan 2008)
“ IN SOUTH KOREA People who want to look something up on the internet don’t “Google it”. Instead they “ask Naver”. (Economist, 30 Feb 2009)
Yeon-Ok Lee and Park. H. W., (2008). "The Importance of Search Engines in Digital News Consumption A Comparative Study Between South Korea and the UK". refereed paper presented at the Workshop “Gatekeepers in a Digital Asian-European Media Landscape: The rising structural power of Internet search engines”(2008).
WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
Component of Naver WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS Log-in The articles title (changing automatically ) The press linked Today’s issues Quick menu browser window
Collects profile information from the public messages posted to initial seed user
Takes approximately 10 seconds per user request
Stores user details so subsequent calls are not needed
As a result of the high numbers of comments on some Cyworld pages, the process of collecting the data can take several days
Cyworld Extractor - Overview Java-based software tool that, given the URL of a politician on Cyworld, extracts comments given by citizens along with related profile attributes. The stored data, which can amount to thousands of records, is stored in a suitable format for import into statistical software
① ② ③ The status of mini-homepy ① How active ②How famous ③How friendly Gender Name Geun-Hye Park’s mini-hompy Visitor count
IP address Cyworld-IP screen capture Seong-Min Yoo’s mini-hompy
Cyworld Extractor – Data One example of possible uses for the collected data is to determine the region of posters commenting from Korea
Cyworld Extractor - Data The country of origin of those users commenting from outside Korea is also possible
WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICS WITH E-RESEARCH TOOLS Case 2. Cyworld Mini-hompies of Korean Legislators Cyworld Mini-hompies of Korean legislators: Co-inlink network map using Yahoo.com However, buddy data is not publicly available!! The network structure using co-link data shows a clear butterfly pattern. T here is one hub (ghism) that belongs to Park Gy un-Hye (Park GH, www.cyworld.com/ghism), the daughter of ex-president Park Jeong-Hee and one of two major GNP candidates (along with president-elect Lee MB) in the 2007 presidential race.
Collects follower/following and Tweets from a chosen user
Has a 150 hit rate-limit imposed by Twitter
When rate limit reached, program will pause and show an indefinite progress dialog until the rate limit renews
User can log in using their Twitter credentials and these will optionally be stored for a future session
Twitter Extractor - Overview Sharing a similar interface and extraction mechanism with the Cyworld extractor, this application requires the URL of a user on Twitter. It is then possible to collect all tweets and determine the attributes of the user’s follower / following network
Twitter Extractor - Data A simple use for this data would be to visualize a user’s network and ascertain which users are reciprocal in their friendships
* A type of tweets - A case Study on twitter of 18th National Assembly Members * Audiences of tweets * Topic of tweets
Korean Internet Network Miner: A Korean version of ICTA
Section 1. Development of the Korean Internet Network Miner
After retrieving the blog data, it was processed to build two types of networks.
First, a chain network was extracted. In the chain network, one
commentator is connected to another if the first commentator directly
replied to the second commentator by clicking on the "reply-to" button .
However, after manually examining a number of comments on several
blogs, we found that there are some comments that are not "reply-to"
comments , but are addressing or referencing a previous poster.
To capture missing connections, we decided to rely on another network
discovery method called the Name network .
This observation is in-line with a previous empirical study on online Learning communities by Gruzd(2009a), which discovered that the chain network miss es on average 40% of possible connections .
Section 1. Development of the Korean Internet Network Miner
Another good example of challenges associated with the name/nickname
disambiguation problem in comments is the word "2mb" .
This is because "2mb” has at least three different meanings.
First, this word can be used as a nickname for one of the blog commentators.
Second, it could refer to the capacity of a computer memory (2 megabytes).
Finally, it could be the alias of the current Korean president , Lee Myung-Bak.
To address these challenges and develop recommendations for the next
generation of the name network discovery algorithm, we conducted a
semi-automated analysis of all names/nicknames discovered from a
sample dataset using our initial algorithm.
Section 2. Evaluation of the Name Network Discovery Algorithm
The evaluation procedure involved clicking on each word found by the
name network algorithm and exploring the context where each instance
of the word was used(see Figure 3). The purpose of this semi-automated
analysis was to discover what name/nickname candidates were identified
incorrectly and why.
<Figure 3> A list of messages containing "2MB”
This semi-automated analysis revealed a set of additional syntactic and
semantic clues that can be used to improve the accuracy of the name
Network discovery algorithm.
Section 2. Evaluation of the Name Network Discovery Algorithm The second set includes clues suggesting that a word is NOT likely to be used as a nickname : ● a word candidate is a phrase—for example, if the nickname input (the "FROM"field) is Used more like a subject line(possible indicators include white spaces and length); ● a word candidate consists of a single character(e.g., "a" or " ㄱ "); ● a word candidate consists of netspeak, including emoticons(e.g. "=_="), slang and abbreviations(e.g., using "2MB" to refer to the current Korean president), and onomatopoeia (e.g. " ㅉㅉ " = tsk tsk, ” ㅋㅋ " = heehee, " 하하 " = haha, " 음 " = hmm); ● a word candidate appears more than one time in the comment; ● a word candidate consists of random characters(e.g. " ㅁㄴㅇㄹ " or "asdf"); ● a word candidate is a short, conversational word or phrase(e.g., " 나나 " = me, " 아이고 " = oh no, " 그래서 " = so/therefore); ● a word candidate is a common word or idea in the given context/topic(e.g., " 대한민국 " = Republic of Korea, " 쥐체사상 " = a newly created word used to refer to political fanatics).
Party policies affected the use of the Web for political purposes
Progressive/minor groups more willing to explore alternative media
Incoming International Hyperlink in 2009 (drawn using ManyEyes.com)
Incoming International Hyperlink in 2009 (drawn using Google Earth)
Thank you for listening! WCU WEBOMETRICS INSTITUTE Acknowledgments. WCU Webometrics Institute acknowledges that this research is supported from the WCU project investigating internet-based politics using e-research tools granted from South Korean Government