Slideshare.net (beta)

 

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

Content-Based Social Network Analysis of Online Communities

From primath, 9 months ago

Anatoliy Gruzd and Caroline Haythornthwaite (U of Illinois)<br />Socia more

1077 views  |  0 comments  |  0 favorites  |  1 embed (Stats)
Download not available ?
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 1077
on Slideshare: 1066
from embeds: 11

Slideshow transcript

Slide 1: Content-Based Social Network Analysis of Online Communities Anatoliy Gruzd Caroline Haythornthwaite Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Social Network/ing Symposium Toronto, 2007

Slide 2: The Problem Online communities are  creating a growing volume of texts contributed by a growing number of participants 100 million posters in Usenet  (Marc Smith, quoted in CNET, 2003) 3.12 terrabytes of data *daily*  on Usenet (2007) 2010, >70% of digital content  will be user-generated, with the Growth of Usenet majority of it will still be text- (wikipedia, Oct. 2007) based (Technology Consultancy IDC)

Slide 3: Making Sense of Community Action How can we help make  sense of community action and interaction based solely on textual interchanges? How can we make the social  structures evident for participants, and managers or teachers? How can we make advances  from linear streams of text to visualized patterns of interaction? Growth of blog activity March 2003-2006 • 175,000 new blogs a day (2006)

Slide 4: Mapping Online Communities Mappings and internal  examinations tend to be based on one aspect of ties Links between sites  Reports of friendship or  work relations FOAF declarations  With a concentration of  quantity over content Flickrverse, Gustavog, 2006 http://www.flickr.com/photo_zoom.gne?id=970 8628&context=set-222111&size=l Based on 50 connections between people.

Slide 5: Mapping Online Communities (2) Welse, Emerging mappings  Gleave, Fisher & include attention to Smith, 2007 Poster activity in JOSS   Actor profiles as posters  Content of sites e.g., words in common  on different sites (Gloor & Zhao, 2006)

Slide 6: Extracting Network Information Determine who is talking to whom  Applying social network analysis techniques  Determine what they are talking about  Applying natural language processing techniques  Merge these to produce network detection  that better represents ongoing processes

Slide 7: Our Goal Use natural language processing (NLP)  enhance the current techniques of building social  networks gain more information and insight about Nodes,  Relations, and Ties Current focus is on bulletin boards  Current example is online learning environment  Procedures are being derived to use for groups  with unknown membership

Slide 8: Adding more with NLP Revealing network information  1. Node discovery  2. Tie discovery  3. Relation discovery  4. Role & Group discovery  Network visibility rather than aggregate  behavior Important for revealing structures to  Participants to understand the ‘lay of the  (cyber)land’ and for instructors (or managers) to oversee participation and intervene as necessary

Slide 9: Adding relational information Few (yet) derive relations from content which can  reveal Networks based on multiple relations  Change in discourse over time  Changes in associations among network members by  relation and time Few deal with the vagaries of CMC texts  Bulletin boards, chat  Incorrect spelling, partial sentences, inventive punctuation  Deriving who is talking to whom from content analysis  Or local language conventions  Acronyms, group naming conventions, group word use  conventions, nicknames for people and processes

Slide 10: Node and Ties Focus today on nodes and tie discovery  Identifying who are the actors in the  network Identify nodes, i.e., people  Make the tie(s) between nodes  Two approaches  Chain Network, based on chain of posting  Name Network, based on names used in  the text

Slide 11: Chain Network: definition options A B C D Connect a sender to the last person in the post 0 0 1 chain only (undirected) Connect a sender to the last and first (=thread 1 0 1 starter) person in the chain, and assign equal weight values (e.g. 1) to both ties. Same as option 2, but a tie between a sender .5 0 1 and the first person is half weight (e.g. 0.5) Connect a sender to all people in the reference .25 .5 1 chain with decreasing weights.

Slide 12: Chain Networks: missed info. Ex.1 Previous post is by Gabriel, Sam replies: ‘Nick, Ann, Gina, Gabriel: I apologize for not backing this up with a good source, but I know from reading about this topic that libraries…’ Ex.2 Previous posts by Gabriel, Sam, Gina, and Eva, then: ‘Gina, I owe you a cookie. This is exactly what I wanted to know. I was already planning on taking 302 next semester, and now I have something to look forward to!’ Post by Fred: Ex.3 ‘I wonder if that could be why other libraries around the world have resisted changing – it's too much work, and as Dan pointed out, too expensive.’

Slide 13: Name networks Making use of node and tie information that is  in the text of the postings Issues  Disambiguating names/nicknames from text  Disambiguate names of people from names of  people being discussed (e.g., subject) Detection of aliases for a given person and  disambiguation of two or more users with the same name

Slide 14: Hand coding: categories Network Participants  <from> = person indicated in ‘from’ line of post heading (NB only info that is system generated)  <addressee> = direct reference to other ('I agree with you Todd')  <reference> = indirect reference to other ('Todd has a good point')  <self-reference> = poster references themselves in some way (braindead library student, high  school teacher, etc.) <signature> = name as given by the message author on their post  Named non-participants  <subject>, <subject 2>, or <subject 3> = name is a subject of the discussion, either as one name  (Dewey), 2 (Brewste Kahle) or 3 (Charles R. Darwin) <non-group reference> = reference to a person who is not in the group, nor the subject – e.g., a  former professor Error  <error> = new name appears because of error (e.g., Lackie as a subject instead of Leckie; or part of  a prevpost line does not conform to the usual format) Previous Posts (if not removed from dataset)  <previous-poster> = when the previous message is included, this indicates the poster (‘Janice  wrote: ’) (system generated) <copy> = name appears because it is included with the previous message 

Slide 15: Examples of hand-coding Just a note to clarify something in yesterday's lecture/chat session. I mentioned  that Monday's NY Times had an article on <#1><subject> Brewster. I want to clarify that the article concerns the copyright extension law and the current Supreme Court case <#1><subject> Eldred v. <#1><subject> Ashcroft, set to begin today, I believe. <#1><subject 2> Brewster Kahle is currently touring the country in a bookmobile … For more info on this … you can refer to the Web site that <#1><reference> Jodie mentioned yesterday… <#1><signature>LA NB. Jodie may not even appear in the contributors to this thread  Several of our programs at UC <#7><subject> Davis have well-intentioned  lower division research methods classes that introduce then never reinforce basic skills. Need to disambiguate “UC Davis” from someone called “Davis”  Research (to paraphrase my hero, <#8><subject> Shrek) is like onions. Not  because it stinks, but because it is made up of layers. “Shrek” as a name will not appear in conventional name lists. 

Slide 16: Automated Node & Tie Discovery Method  Determine names in the dataset, and 1. assign a probability value Determine email address to name 2. relationship Assign tie weight to each discovered tie 3.

Slide 17: Automated Node Discovery Named Entities Recognition  Discovery of personal names  The 1990 US Census http://www.census.gov/genealogy/names  Capitalization  Distinguishing between names of people in and outside the  class Having a list of names doesn’t always work  e.g., if someone uses their middle name which is not on the name list,  or they use a short or nickname; Method: associate names with email addresses in the class  relying on content-based (e.g. context words) and structure-based (e.g.  word position) features of names Issues  Many names - same person  Same name - many people 

Slide 18: Automated Node Discovery (2) EXAMPLE From: wilma@bedrock.us (=Wilma) Reference Chain: tank123@gl.edu (=Dustin) => hle@gl.edu (=Sam) Hi Dustin, Sam, Nick and all, I appreciate your posts from this and last week […]. I keep thinking of poor Charlie who only wanted information on “dogs“ Sam has been talking about. […] Wilma. Words Name Words Position Score Score to the Left to the Right % “TO” “FROM” * Hi Dustin Sam, Nick, 0 0.322 -0.004 * Dustin, Sam Nick and 1 0.321 -0.002 Dustin, Sam, Nick and all, 2 0.320 -0.001 of poor Charlie who only 50 0.05 0.04 on “dogs“ Sam has been 65 0.285 0.07 * Wilma * 88 0.0012 0.116 * - end of the line

Slide 19: Automated Tie Discovery Associate each sender in the class with all names mentioned in  his/her emails. For example, Wilma ---> Dustin = tank123@gl.edu  Wilma ---> Charlie  no email for Charlie, so not a person in the  conversation group (e.g., when Steve and I took Professor Sid’s course last year) Wilma --->  no mention of a name; info on tie is only in the Chain  network; could start of a thread or change of topic within a thread, or a general posting Assign tie weight  Pair counts  Mutual information 

Slide 20: Chain vs. Name Networks Get added information from the name  network Ex. BBoards #06,07,08  Nodes: 37 Messages: 346  Chain network ties: 223  Name network ties: 215 / 429  Shared ties: 140  QAP Pearson Correlation: 0.453 (p = .000) 

Slide 21: An ego network for Brent Name Network Chain Network Visualization powered by http://www.netvis.org

Slide 22: An ego network for Tyler Name Network Chain Network kurt -> Kurt Cobain, a lead singer for the rock band Nirvana dewey -> John Dewey, philosopher & educator santa_monica -> Santa Monica Public Library mark –> mark up language Visualization powered by http://www.netvis.org

Slide 23: Conclusion Uses and benefits of content-based networks  Discovery of social network behavior rather than posting  behavior Discovery of social interactions between group members  that happened outside the group (e.g. fishing trip) Discovery of relations between group members and people  outside the group (e.g. a shared friend from another department) Expert/Co-discussant discovery  Study of perceived social networks without directly collecting  survey-data from participants (?)

Slide 24: References and Further Reading Related papers  Haythornthwaite, C. & Gruzd, A. (2007). A noun phrase  analysis tool for mining online community. In C. Steinfield, B.T. Pentland, M. Ackerman & N. Contractor (eds.). Communities and Technologies 2007 (pp. 67-86). London: Springer. Howard T. Welser, Eric Gleave, Danyel Fisher, and Marc  Smith (2007) Visualizing the signatures of social roles in online discussion groups. Journal of Social Structure, 8(2). http://www.cmu.edu/joss/content/articles/volume8/Welser/