• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
 

Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

on

  • 2,029 views

 

Statistics

Views

Total Views
2,029
Views on SlideShare
2,024
Embed Views
5

Actions

Likes
0
Downloads
22
Comments
0

1 Embed 5

http://www.slideshare.net 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 2010 년 3 월 9 일 검색결과
  • http://english.hani.co.kr/arti/english_edition/e_national/386276.html http://english.hani.co.kr/arti/english_edition/e_national/386276.html
  • http://www.atimes.com/atimes/Korea/FK25Dg01.html

Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02 Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02 Presentation Transcript

  • Mapping social, political, and scientific landscape using webometrics method Asso. Prof. Han Woo PARK Department of Media & Communication YeungNam University 214-1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do 712-749 Republic of Korea [email_address] http://www.hanpark.net http://english-webometrics.yu.ac.kr http://asia-triplehelix.org Thanks to my colleagues and students at the WWI. Virtual Knowledge Studio (VKS)
    • Invited speech, Department of Media & Communication, City University of Hong Kong, 29 March 2010
    • ( Topic: Mapping social, political, and scientific landscape using webometric method )
  • Outline of presentation
    • development of webometrics tools to automate social Internet research process (e.g., data collection and analysis from search engines, SNS and microblogging sites)
    • 2. experimentation with new types of data visualization across period and platform (e.g, dynamic mappings using HNA)
  • Webometrics in terms of e-research A minor but growing approach to the study of Internet-mediated communication A new methodological perspective based on the use of new digital tools available online for conducting humanities and social science Internet research
  • Research tradition of Webometrics
    • 1) development of online tools to automate the Internet research process , such as data collection and analysis
    • 2) experimentation with new types of data visualization , such as social network and hyperlink analysis and multimedia and dynamic mappings
  • http://participatorysociety.org/wiki/index.php?title=Online_Research
  • Web Scrapers, Crawlers, Tools in WCU
  • Overview
    • Collecting data from search engines: Naver.com, Google.com
    • Digging Social Networking Services : Cyworld Minihompies, Facebook, Plurk
    • Microblogging sites : Twitter, TwtKr.com
    • Korean Internet Network Miner : A Korean version of Dr. A. Gruzd’s ICTA
    • Web archiving of Korean MPs : http://www.web-archive.kr/
    • In various degrees of development
    • Return data from web in a suitable form to import into Excel, SPSS, LexiURL, etc
    • Returned data will contain all values, only some of these may be relevant for the current query however having all of the data will ensure that you can revisit later if another project requires more variables
    • All programs have time-rests, though these vary depending on the service being accessed.
    • The purpose of this paper is to introduce the
    • API-based webometrics tool created for the
    • Korean search engine Naver
    • This non-commercial software is designed to
    • collect large amounts of data automatically and
    • can easily distinguish between different types of
    • information on the web, which was impossible
    • before.
    Webonaver (Webometrics Tool for Naver) (Image Source: Newsweek, 5 Nov 2007) WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
  • Rationale for the Naver
    • “ Republic of Naver” (Kim & Sohn, 2007)
    • “ Korea’s Naver is now the world’s 5th search service provider, behind Google, Yahoo, Baidu and Microsoft.” (The AP, 9 Oct 2007)
    • “ Google left behind as Koreans Naver-gate the internet” (Financial Times, 2 Jan 2008)
    • “ IN SOUTH KOREA People who want to look something up on the internet don’t “Google it”. Instead they “ask Naver”. (Economist, 30 Feb 2009)
    • Yeon-Ok Lee and Park. H. W., (2008). "The Importance of Search Engines in Digital News Consumption A Comparative Study Between South Korea and the UK". refereed paper presented at the Workshop “Gatekeepers in a Digital Asian-European Media Landscape: The rising structural power of Internet search engines”(2008).
    WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
  • Component of Naver WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS Log-in The articles title (changing automatically ) The press linked Today’s issues Quick menu browser window
  • Naver search options
  • Interface WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
    • The interface is fairly self-explanatory:
    • Tick or untick to collect either only hit number or the title, URL, and description of the results 
    • - Select which of the search options you want to include
    • - Click on the '...' button to select the text file that contains the queries you wish to run
    • - Click 'Run Queries'
    WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
  • http://english-webometrics.yu.ac.kr/WebometricsTools/WeboNaver/WeboNaver.html
  •  
  • U-I-G TH Trend Analysis Search Area : Title, Content
  •  
    • web presence of the term H1N1 is examined using
    • Webonaver. We tested the usability and reliability of this tool.
    • Queries:
    • 신종플루 ( A virus subtype H1N1 )
    • 신 종 인 플루엔자 ( Influenza A virus subtype H1N1)
    • 신 종인 플루엔자 ( Influenza A virus subtype H1N1)
    • Users can get same results from certain words containing space character and the one without space using WeboNaver.
    • But, it can not assume similar words as same. Users should consider which specific data they want to extract before using this tool.
    Web presence of the term H1N1 WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
  • Monitoring a Socio-political Blogosphere in South Korea: Comparing a Metrics from Blogosphere with Voter Turnout
    • Data
      • Blog postings related to 29 candidates for the 2009 Korean National Assembly by-election.
    • Data gathering
      • Korean-language based blog search engine by Naver.com
      • Real-time blog monitoring program by WWI
      • Search queries: the name of candidate + “candidate”
      • Search date: After Oct. 8, 2009
      • Data collection periods: Oct. 16 – Oct. 27, 2009 (12 days)
      • Cycle: Twice per a day (AM 00:00, PM 12:00)
  • Trend Analysis
    • Jangan district in Suwon City, Gyeonggi Province
    (Park, CS) (Lee, CY) (Ahn, DS) (Yoon, JY)
  • Blogs vs. Votes
    • Jangan district in Suwon City, Gyeonggi Province
    (Park, CS) (Lee, CY) (Ahn, DS) (Yoon, JY) (Park, CS) (Lee, CY) (Ahn, DS) (Yoon, JY) N. of Votes N. of Blogs
  • Constituency Candidate Blog % Rank Vote % Rank Jangan, Suwon, Gyeonggi Park, CS( 박찬숙 ) 213.4 35.6 2 33,106 42.7 2 Lee, CY( 이찬열 ) 216.6 36.1 1 38,187 49.2 1 Ahn, DS( 안동섭 ) 158.4 26.4 3 5,570 7.2 3 Yoon, JY( 윤준영 ) 11.8 2.0 4 716 0.9 4 Sangrok-B, Ansan, Gyeonggi Song, JS( 송진섭 ) 147.8 17.0 3 11,420 33.2 2 Kim, YH( 김영환 ) 280.1 32.3 1 14,176 41.2 1 Jang, KW( 장경우 ) 64.0 7.4 4 1,145 3.3 4 Kim, SK( 김석균 ) 25.7 3.0 6 896 2.6 6 Yoon, MW( 윤문원 ) 22.8 2.6 7 439 1.3 7 Lee, YH( 이영호 ) 59.5 6.9 5 987 2.9 5 Lim, JI( 임종인 ) 268.6 30.9 2 5,363 15.6 3 Gangreung, Gangwon Kwon, SD( 권성동 ) 85.6 32.9 1 29,010 50.9 1 Hong, JK( 홍재경 ) 68.0 26.1 3 2,100 3.7 4 Song, YC( 송영철 ) 72.1 27.7 2 19,867 34.8 2 Shim, KS( 심기섭 ) 34.9 13.4 4 6,054 10.6 3 North Chungcheong (4 districts) Kyoung, DS( 경대수 ) 140.2 25.2 2 19,427 28.4 2 Chung, BG( 정범구 ) 167.1 30.0 1 29,120 42.5 1 Chung, WH( 정원헌 ) 65.2 11.7 5 3,071 4.5 4 Park, KS( 박기수 ) 68.8 12.4 4 2,125 3.1 5 Lee, TH( 이태희 ) 33.2 6.0 6 504 0.7 6 Kim, KH( 김경회 ) 81.7 14.7 3 14,218 20.8 3 Yangsan, South Gyungsang Park, HT( 박희태 ) 258.2 30.4 1 16,597 37.9 1 Song, IB( 송인배 ) 214.2 25.2 2 15,577 35.6 2 Park, SH( 박승흡 ) 134.0 15.8 3 1,550 3.5 5 Kim, SG( 김상걸 ) 33.4 3.9 6 900 2.1 6 Kim, YS( 김양수 ) 88.7 10.5 4 5,875 13.4 3 Kim, YK( 김용구 ) 26.6 3.1 8 234 0.5 8 Kim, JM( 김진명 ) 29.3 3.5 7 325 0.7 7 Yoo, JM( 유재명 ) 64.3 7.6 5 2,710 6.2 4
  • Results
    • Correlation Analysis (N. of Blogs & N. of Votes)
      • Pearson r = .586, p < .01 (N=29)
      • Spearman rho = .797, p < .01 (N=29)
    • Simple Regression Analysis
      • N. of Votes = 1,055.56 + 79.99(N. of Blogs)
      • R 2 = .344 ( F = 14.128, p < .01)
      • ß = .586 ( t = 3.759, p < .01)
  • Summary
    • Overall, the number of blogs by candidates has a tendency to increase over time.
    • By districts, the candidate who has the largest blog postings won the election.
    • The results of correlation analyses (Pearson and Spearman) significantly indicate the positive relationship between blog postings and votes.
    • From the results of a simple regression analysis, the number of blogs by candidates can be regarded as a significant determinant of the number of votes.
  • Cyworld
    • Collects profile information from the public messages posted to initial seed user
    • Takes approximately 10 seconds per user request
    • Stores user details so subsequent calls are not needed
    • As a result of the high numbers of comments on some Cyworld pages, the process of collecting the data can take several days
  • Cyworld Extractor - Overview Java-based software tool that, given the URL of a politician on Cyworld, extracts comments given by citizens along with related profile attributes. The stored data, which can amount to thousands of records, is stored in a suitable format for import into statistical software
  • ① ② ③ The status of mini-homepy ① How active ②How famous ③How friendly Gender Name Geun-Hye Park’s mini-hompy Visitor count
  •  
    • After South Korean government concluded negotiation of American beef import in April, there are many conflicts between government and public opinion during the May, June, 2008.
    • As graph indicates, compared to before, the biggest number of comments was recorded on all assembly members’ Minihompies in May and June, 2008.
    • Among of them, specially, the biggest number of comments is recorded on mini-hompy of Kyung-TaeJo and Kyeong-Won Na.
  • South Koreans fearing 'mad cow disease' fight US beef imports in May and June 2008
  •  
  •  
  • IP address Cyworld-IP screen capture Seong-Min Yoo’s mini-hompy
  • Cyworld Extractor – Data One example of possible uses for the collected data is to determine the region of posters commenting from Korea
  • Cyworld Extractor - Data The country of origin of those users commenting from outside Korea is also possible
  • WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICS WITH E-RESEARCH TOOLS Case 2. Cyworld Mini-hompies of Korean Legislators Cyworld Mini-hompies of Korean legislators: Co-inlink network map using Yahoo.com However, buddy data is not publicly available!!
  • Facebook
    • Searches for groups with links to petition sites
    • Stores group membership numbers
    • Queries petition site and stores number of signatures
    • Takes approximately 10 seconds per request
    • No interface
  • Facebook
  • Plurk
    • Gathers friends and fans list from an initial seed user
    • Returns two text files: one containing friends and one containing fans
    • No interface at present and all commands must be entered through a command prompt
    • Takes approximately 5 seconds per request
  • Plurk
  • Research examples on Plurk Karma: the system will give a user a score. Karma indicates the active degree of the user (e.g messaging, comment, use of system's emoticons etc) when we point our mouse to the Karma score, the user's Karma trend is shown.
  • Google
    • Collects a maximum of 1,000 top search listings
    • Writes the listing URL out to a text file
    • Interface allows setting certain parameters; such as file type, language, and country.
    • More can be added to the current list of options
    • Takes approximately 3 seconds per page of results (1 page = 100 results)
  • Twitter
    • Collects follower/following and Tweets from a chosen user
    • Has a 150 hit rate-limit imposed by Twitter
    • When rate limit reached, program will pause and show an indefinite progress dialog until the rate limit renews
    • User can log in using their Twitter credentials and these will optionally be stored for a future session
  • Twitter Extractor - Overview Sharing a similar interface and extraction mechanism with the Cyworld extractor, this application requires the URL of a user on Twitter. It is then possible to collect all tweets and determine the attributes of the user’s follower / following network
  • Twitter Extractor - Data A simple use for this data would be to visualize a user’s network and ascertain which users are reciprocal in their friendships
  • * A type of tweets - A case Study on twitter of 18th National Assembly Members * Audiences of tweets * Topic of tweets
  • Twtkr.com Scraper
  • Twitter.com VS Twtkr.com
    • Korean twitter messages are not well indexed in Twitter.com
    • Twtkr.com is customized for retrieving Korean twitter messages
    • Scrapper was made to automate data collection procedures
    • Korean tweets including ‘Sejong city’( 세종시 ) have been daily harvested during March
  • Sejong City Project
    • Current President Lee MB is trying to change the existing plan structured around relocating several government offices to the city (drafted by ex-Presient Roh MH)
    • Proponent of original plan: Necessary for regional development
    • Opponent: Partitioning of the capital would weaken Seoul’s competitiveness
  • Identifying ‘ twitter-tariat’
    • Twitter-tariat: A group that responds and gives meaning to social issues via Twitter ( modified from N. Anstead & B. O’Loughlin’s Viewertaiat )
    • Top 10 twitterians in terms of the occurrence of their tweets related to ‘Sejong’ city ( 세종시 )
    • Investigating who they are, who follows them, who they follow, what they tweet; and ‘networked’ positions among peers
  • Media company Individual Type Location Je-ju Seoul Chung-nam Oversea Dae-jeon Others S.Korea Count Korean ‘twitter-tariat’ on Sejong city during March 2010
  • Tweet 언론매체 개인 유형 지역 제주 서울 충남 해외 대전 기타 한국 Keyword : 세종시
  • 언론매체 개인 유형 지역 제주 서울 충남 해외 대전 기타 한국 Follower Keyword : 세종시
  • Following 언론매체 개인 유형 지역 제주 서울 충남 해외 대전 기타 한국 Keyword : 세종시
  • Korean Internet Network Miner: A Korean version of ICTA
  • Section 1. Development of the Korean Internet Network Miner
    • After retrieving the blog data, it was processed to build two types of networks.
    • First, a chain network was extracted. In the chain network, one
    • commentator is connected to another if the first commentator directly
    • replied to the second commentator by clicking on the &quot;reply-to&quot; button .
    • However, after manually examining a number of comments on several
    • blogs, we found that there are some comments that are not &quot;reply-to&quot;
    • comments , but are addressing or referencing a previous poster.
    • To capture missing connections, we decided to rely on another network
    • discovery method called the Name network .
    This observation is in-line with a previous empirical study on online Learning communities by Gruzd(2009a), which discovered that the chain network miss es on average 40% of possible connections .
  • Section 1. Development of the Korean Internet Network Miner
    • Name Network>
    • Another good example of challenges associated with the name/nickname
    • disambiguation problem in comments is the word &quot;2mb&quot; .
    • This is because &quot;2mb” has at least three different meanings.
    • First, this word can be used as a nickname for one of the blog commentators.
    • Second, it could refer to the capacity of a computer memory (2 megabytes).
    • Finally, it could be the alias of the current Korean president , Lee Myung-Bak.  
    • To address these challenges and develop recommendations for the next
    • generation of the name network discovery algorithm, we conducted a
    • semi-automated analysis of all names/nicknames discovered from a
    • sample dataset using our initial algorithm.
  • Section 2. Evaluation of the Name Network Discovery Algorithm
    • The evaluation procedure involved clicking on each word found by the
    • name network algorithm and exploring the context where each instance
    • of the word was used(see Figure 3). The purpose of this semi-automated
    • analysis was to discover what name/nickname candidates were identified
    • incorrectly and why.
    • <Figure 3> A list of messages containing &quot;2MB”
    • This semi-automated analysis revealed a set of additional syntactic and
    • semantic clues that can be used to improve the accuracy of the name
    • Network discovery algorithm.
  • Section 2. Evaluation of the Name Network Discovery Algorithm The second set includes clues suggesting that a word is NOT likely to be used as a nickname :   ● a word candidate is a phrase—for example, if the nickname input (the &quot;FROM&quot;field) is Used more like a subject line(possible indicators include white spaces and length);   ● a word candidate consists of a single character(e.g., &quot;a&quot; or &quot; ㄱ &quot;); ● a word candidate consists of netspeak, including emoticons(e.g. &quot;=_=&quot;), slang and abbreviations(e.g., using &quot;2MB&quot; to refer to the current Korean president), and onomatopoeia (e.g. &quot; ㅉㅉ &quot; = tsk tsk, ” ㅋㅋ &quot; = heehee, &quot; 하하 &quot; = haha, &quot; 음 &quot; = hmm); ● a word candidate appears more than one time in the comment; ● a word candidate consists of random characters(e.g. &quot; ㅁㄴㅇㄹ &quot; or &quot;asdf&quot;); ● a word candidate is a short, conversational word or phrase(e.g., &quot; 나나 &quot; = me, &quot; 아이고 &quot; = oh no, &quot; 그래서 &quot; = so/therefore); ● a word candidate is a common word or idea in the given context/topic(e.g., &quot; 대한민국 &quot; = Republic of Korea, &quot; 쥐체사상 &quot; = a newly created word used to refer to political fanatics).
  • http://www.openamplify.com/
  • http://www.openamplify.com/
  • http://www.openamplify.com/ 1,000 free requests per day
  • Chosun VS OhMyNews
    • The influential print-media establishment is composed of the &quot;big three&quot; conservative dailies, the Chosun, Jong Ang and Dong-A Ilbos, that lead the nation in circulation.
    • OhMyNews: A new type of participatory journalism with its thousands of ordinary citizens as contributors.
  • OhMyNews vs.Chosun: Emotionality comparison (Jul 2009 - Feb 2010)
    • Using the sentiment analysis, we are trying to find differences and similarities in emotional polarity of main topics covered in news stories by OhMyNews versus Chosun.
    • &quot;MEAN POLARITY&quot; - represents polarity on the scale from -1 (negative) to 1 (positive) for 78 popular topics covered in the both newspapers.
    • For example, topic &quot;Uganda&quot; tend to be mentioned in the positive context by OhMyNews, but in the negative context by Chosun. Or topic &quot;opposition&quot; tend to be neutral in OhMyNews, but positive in Chosun, and so on
    • Web archiving of Korean MPs : http://www.web-archive.kr/
  •  
    • Experimentation with new types of data visualization across period and platform (e.g, dynamic mappings using HNA)
    • Data Collection for Web 1.0
    • Official homepages of South Korean Assembly members
    • Manual collection: Observation
    • Inter-linkage: Who links to whom matrix
    • Explicit links excluding links in board
    • 2-Year tracking of same Assembly members: 2000-2001
    Sociology of Hyperlink Networks of Web 1.0, Web 2.0, and Twitter
  • Web 1.0 2000 2001
    • 59 isolated in 2000
    • more centralised in 2001
    • network of 2001 ➭ a ‘star’ network
    • might affected by political events ➭ presidential election in 2001
    • Data collection for Web 2.0
    • Personal blogs of South Korean Assembly members
    • Manual collection: Observation
    • Blogroll links: Excluding links in postings
    • Inter-linkage: Who links to whom matrix
    • 2-Year tracking of same Assembly members: 2005-2006
    • Phone interview about usage behaviours
  • Web 2.0 2005 2006
    • hubs disappearing
    • easy use of blogs
    • Clear boundaries between different parties
    • strong presence of GNP Assembly members ➭ party policy on using blogs
    • Twitter
    • more connection between different parties
    • the ruling party pays less attention on alternative media
  • Web Type Year Sum of links (Mean) Density Centralisation Gini Coefficient In Out Web 1.0 (N=245) 2000 373 (1.52) 0.006 1.84 69.33 0.984 2001 515 (2.10) 0.009 1.19 99.55 0.996 Web 2.0 (N=99) 2005 652 (6.59) 0.067 22.07 41.66 0.759 2006 589 (5.95) 0.061 20.67 35.10 0.763 Twitter (N=22) 2009 111 (5.05) 0.240 24.72 39.68 0.408
    • Network analysis
      • Web 1.0 (homepage) : loose, few important hubs & becoming a start network
      • Web 2.0 (blog): denser, clear boundaries between opposition groups
      • Twitter: denser than blog networks
      • contributed by technological development ➭ more interactive/participatory
    • Findings on online activities (Web 2.0 & Twitter) reflect offline situations
      • Party policies affected the use of the Web for political purposes
      • Progressive/minor groups more willing to explore alternative media
  • Incoming International Hyperlink in 2009 (drawn using ManyEyes.com)
  • Incoming International Hyperlink in 2009 (drawn using Google Earth)
  • Thank you for listening! WCU WEBOMETRICS INSTITUTE Acknowledgments. WCU Webometrics Institute acknowledges that this research is supported from the WCU project investigating internet-based politics using e-research tools granted from South Korean Government