Target link presentation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • this approach is largely descriptive and does not consider the accompanying text
  • LingPipe is a comprehensive NLP toolkit and the methods used in the developed Java class enabled three forms of analysis: Sentiment Analysis, Collocation, and Language Identification
  • LingPipe is a comprehensive NLP toolkit and the methods used in the developed Java class enabled three forms of analysis: Sentiment Analysis, Collocation, and Language Identification
  • as wall-cleaning (Raynes–Goldie, 2010), occurs when the owner of a profile page periodically or reactively evaluates comments and deletes those that cast the owner in an unfavorable light. Howver, whilst the occurrence of this process on facebook is not in question, the degree to which this happens has been challenged. Walther, et al. (2008) explain that deleting content regardless of whether it is deemed to be negative or unflattering is avoided as this contravenes the spirit of open content. Smith and Kidder (2010) extend this concept to other forms of user generated content and explain that social norms deter users from deleting content once it is in the community.  The practice of deleting content in Korea however appears to not be restrained by the same unwritten rules as that which govern Facebook. Yoo (2009) explains that content that submissions to user message boards are routinely deleted if the SNS page owner judges them to be unflattering or negative.  In addition to the practice of cultural deleting of content, there also exists a legal motivation to remove that which is deemed to be incorrect or negative. The extent to which this deletion practice occurs remains unclear, although legal frameworks exist in Korea and elsewhere to encourage the deletion of content by either the service provider or owner of the SNS account.
  • For example, the linking of a petition to call upon the governing president to be impeached combined with the name of the president occurring frequently and the negative sentiment recorded does point to….
  • Target link presentation

    1. 1. What is the link and text doing here: A Case Study of Cyworld Minihompies in Korea<br />Steven Sams and Han Woo Park<br />
    2. 2. Background<br />This study analyses user-generated comments posted to Korean politicians on SNS Cyworld that contain a URL<br />The study examines the type of service being linked to through the URL and determines the frequency of services<br />A developed program captures all comments given to a selected set of politicians within a predefined timeframe<br />The text component of messages is analyzed using two separate machine-learning mechanisms<br />
    3. 3. Types of Hyperlinks<br />Five social functions that hyperlinks can be said to perform<br />Information Provision<br />Network Strengthening<br />Identity Building<br />Audience Sharing<br />Message Amplification<br />Ackland et al. (2010) <br />
    4. 4. Online Korean Political Sphere<br />As in other countries, Korean politicians are increasingly turning to social networks as a means to engage with their electorate<br />In 2007 Cyworld commanded a penetration rate of one third of the total population of South Korea, and since then all indications are that this proportion has increased. <br />
    5. 5. Sample<br />One hundred and thirty Korean National Assembly Members’ Cyworld Minihomies.<br />The date parameters of the study were April 2008 – June 2009<br />One hundred and fifty three thousand six hundred and two comments were collected for period chosen for the study. <br />One thousand two hundred and seventy six comments contained links<br />
    6. 6. Data Collection Method<br />A program was developed that performs HTTP call to request one page of comments from the politician’s visitor board<br />The content and date are isolated and held in temporary storage. <br />The process repeats until the target date parameters have been met.<br />
    7. 7. Data Analysis Method: Links<br />The links are checked to determine the number of unique URLs and corresponding number of unique domains. These links / domains are then manually categorised into website type, such as portals, media, parties, homepages of politicians, petition sites, online fan clubs, and NGOs)<br />Location of service found using network query tool to determine the proportion of domestic and international websites<br />
    8. 8. Data Analysis Method: Text<br />To analyse a large body of text, Natural Language Processing (NLP) is one approach to categorisation that can mitigate the problem of obtaining accurate results that is unfeasible to perform manually<br />A rudimentary Java class was developed that wrapped a small subset of the methods provided in the LingPipe API so that they could be called on the extracted text comments.<br />The developed Java class enabled two forms of analysis: Sentiment Analysis and Collocation<br />
    9. 9. Sentiment Analysis<br />A polarity analyser was developed that is able to locate significant word combinations and, using the developed corpus model as a training dataset, determine if the combination is generally positive or negative<br />An accessible corpus of positive and negative sentiment composed in Korean has yet to be realized.<br />A sample body of 2000 Korean text statements were coded into objective, subjective - positive and subjective - negative categories<br />
    10. 10. Collocation<br /> Collocation analysis can determine which tokens are more frequently found together than would normally be expected. Collocation can identify proper nouns in this way (such as the names or persons, places, or events) that would be lost if the frequency of each token were analysed in isolation. <br />
    11. 11. Results - Links<br />153,602 comments were collected for period chosen for the study<br />1,276 comments contained hyperlinks<br />Total link count was 1,920 as it was common to have more than one hyperlink contained within an individual posting <br />762 were unique full URLs and 259 were unique domains<br />1,849 URLs encountered in the sample were found to belong to services based in Korea and 71 from international service<br />Performing message amplification and network building were prominent causes of link posting<br />
    12. 12. Table 1: LexiURL Unique / Full hosts<br />Based on the top 10 domains (24.5%) by occurrence out of 259<br />
    13. 13. Table 2: LexiURL Unique / Full URLs<br />
    14. 14. Table 3: Total links to each domain (Korea)<br />Based on 1,078 (58.3%) of 1,849 links to Korean services<br />
    15. 15. Table 4: Total links to each domain (Overseas)<br />Based on 51 (71.8%) of 71 links to overseas services<br />
    16. 16. Table 5: poster-gender and politician background<br />
    17. 17. Table 6: Comments categorized by link type from the six groups of gender and political affiliation<br />Table 6: Comments categorized by link type from the six groups of gender and political affiliation<br />Based on 206 comments agreed on by both coders from the initial set of 300<br />
    18. 18. Results - Text<br />May and June 2008 were found to have high numbers of comments containing links that showed negative sentiment, and this date corresponds with the period of the candlelight protest<br />May 2009 also shows large numbers of comments containing hyperlinks that indicate negative sentiment, coinciding with the suicide of ex-president Roh Moo-Hyun<br />The name of Korean President Lee Myung-bak was found to occur two hundred and twenty nine times<br />Terms pertaining to the candlelight protests, such as Mad Cow disease, beef, American goods, and candlelight protest occurred frequently<br />Gini coefficient and a less formal term describing a similar measurement of wealth occurred frequently<br />
    19. 19. Figure 1. Positive and negative sentiment from comments containing links<br />
    20. 20. Confidence Levels<br /> To determine the effectiveness of the classification approach, 10% of training data was removed from the training set and used to evaluate the developed model. This approach allows testing the classification based on known human-classified data. The Average Conditional Probability score provides a basis for determining the ability of the classifier to correctly identify positive and negative sentiment. Based on the training set used, the Average Conditional Probability was found to be 87%. <br />
    21. 21. Limitations<br />Less than 1% of all comments posted to the sample of politicians and indicates that although previous studies have shown how links can support communication in SNSs, their frequency in the Korean online political environment remains rare<br />Comments deleted over the period of the study may omit the full extent of negative sentiment towards politicians<br />The practice of deleting content in Korea has been found to be less constrained by social norms than found in Western SNSs, such as Facebook<br />Legal mechanisms also exist in Korea to encourage the removal of negative content during election periods<br />
    22. 22. Conclusion<br />Links are almost solely targeted to Korean domestic services, and the few that do point to overseas sites are usually related in some way to domestic issues in Korea<br />Males are marginally more likely to comment on Cyworld Minihompies using links than females, and those Minihompies managed by ruling politicians were found to be of greater prominence than those of the opposition parties<br />Message Amplification and Network Building were found to be the dominant purpose for submitting links within user-generated comments. <br />Using two forms of machine-based learning algorithms, sentiment analysis and collocation of significant phrases, revealed primarily negative sentiment towards President Lee and his role in the reintroduction of American beef imports. Issues surrounding the suicide of ex-President Roh suggested anger towards those who were seen to be harassing him prior to his death<br />
    23. 23. Acknowledgement<br /> Research for this paper has been supported by the World Class University (WCU) program through the National Research Foundation of Korea, which is funded by the Ministry of Education, Science and Technology (No. 515-82-06574).<br />
    24. 24. Thank you<br />