TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC, by Ed H. Chi
1. Enhancing the Social Web through Augmented Social Cognition Research Ed H. Chi 紀懷新 , Area Manager Peter Pirolli, Lichan Hong, Bongwon Suh, Gregorio Convertino, Les Nelson, Rowan Nairn Augmented Social Cognition Area Palo Alto Research Center Interns: Sanjay Kairam, Jilin Chen, Michael Bernstein Alumni: Raluca Budiu, Bryan Pendleton, Niki Kittur, Todd Mytkowicz, Terrell Russell, Brynn Evans, Bryan Chan, KMRC students Image from: http://www.flickr.com/photos/ourcommon/480538715/ 2010-03-15 Ed H. Chi ASC Overview
9. Wikipedia Success is counter-intuitive “ Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.” – Steve Carell, The Office 2010-03-15 Ed H. Chi ASC Overview
13. 2010-03-15 Ed H. Chi ASC Overview Characterization Models Prototypes Evaluations
14. Conflict/Coordination Effects in Wikipedia [Kittur et al., CHI2007] 2010-03-15 (joint work with Niki Kittur, Bongwon Suh, Bryan Pendleton) Ed H. Chi ASC Overview
15.
16.
17.
18.
19.
20.
21.
22. Opinions on Dokdo/Takeshima 2010-03-15 Ed H. Chi ASC Overview Group A Group B Group C Group D Number of users in user group A B C Total Users with Korean point of view 10 6 0 16 Users with Japanese point of view 1 8 7 16 Neutral or Unidentified 7 3 6 17
23. Mediator Pattern - Terri Schiavo Mediators Sympathetic to parents Sympathetic to husband Anonymous (vandals/spammers) 2010-03-15 Ed H. Chi ASC Overview
24. Ratio of Reverted Contribution Monthly Ratio of Reverted Edits 2010-03-15 Ed H. Chi ASC Overview
25. 2010-03-15 Ed H. Chi ASC Overview Characterization Models Prototypes Evaluations
26. Example: Modeling Wikipedia Growth Bongwon Suh, Gregorio Convertino, Ed H. Chi, Peter Pirolli 2010-03-15 Ed H. Chi ASC Overview Bongwon Suh, Gregorio Convertino, Ed H. Chi, Peter Pirolli. The Singularity is Not Near: Slowing Growth of Wikipedia. In Proc. of WikiSym 2009. Oct, 2009. Florida, USA
28. Slowing Growth in Global Activity Monthly Active Editors 2010-03-15 Ed H. Chi ASC Overview
29.
30.
31.
32.
33.
34. Using Information Theory to Model Social Tagging [Ed H. Chi, Todd Mytkowicz, ACM Hypertext 2008] Topics Users Documents Decoding 2010-03-15 Ed H. Chi ASC Overview Concepts Tags T 1 …T n Encoding Noise
36. H(Doc | Tag), browsability 2010-03-15 Ed H. Chi ASC Overview
37. I ( Doc ; Tag ) Mutual Information 2010-03-15 Ed H. Chi ASC Overview Source: Hypertext 2008 study on del.icio.us (Chi & Mytkowicz)
38. Raise in avg. tag per bookmark (note parallel the development in increasing # of query words) 2010-03-15 Ed H. Chi ASC Overview
39. Understanding a new area… 2010-03-15 Characterization Models Prototypes Evaluations Ed H. Chi ASC Overview
40. MrTaggy.com: social search browser with social bookmarks Joint work with Rowan Nairn, Lawrence Lee Kammerer, Y., Nairn, R., Pirolli, P., and Chi, E. H. 2009. Signpost from the masses: learning effects in an exploratory social tag search browser. In Proceedings of the 27th international Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 625-634. 2010-03-15 Ed H. Chi ASC Overview
43. TagSearch: Use Semantic Analysis to Reduce Noise http://mrtaggy.com 2010-03-15 Ed H. Chi ASC Overview Guide Web Howto Tips Help Tools Tip Tricks Tutorial Tutorials Reference Semantic Similarity Graph
44.
45.
46. Understanding a new area… 2010-03-15 Characterization Models Prototypes Evaluations Ed H. Chi ASC Overview
52. Living Laboratory: Prototyping Social Applications on the Internet Create a Living Laboratory as a platform to develop, test, and market innovations [HCIC workshop 2009, HCII 2009, IEEE Computer Sep/2008] 2010-03-15 Ed H. Chi ASC Overview
66. A way to think about these systems Voting systems Collaborative Co-Creation Col. Information Structures 2010-03-15 Ed H. Chi ASC Overview Naver Heavier collaboration Digg.com Wikipedia Slashdot eHow.com Del.icio.us IBM dogear PageRank Flickr
67.
68. WikiDashboard: Social Transparency for Wikipedia Joint work with Bongwon Suh, Aniket Kittur, Bryan Pendleton Bongwon Suh, Ed H. Chi, Aniket Kittur, Bryan A. Pendleton. Lifting the Veil: Improving Accountability and Social Transparency in Wikipedia with WikiDashboard. In Proceedings of the ACM Conference on Human-factors in Computing Systems (CHI2008). ACM Press, 2008. Florence, Italy. 2010-03-15 Ed H. Chi ASC Overview
69.
70. Top Editor - Wasted Time R 2010-03-15 Ed H. Chi ASC Overview
79. TagSearch Exploratory Focus 3 kinds of search 2010-03-15 Ed H. Chi ASC Overview navigational transactional 28% 13% You know what you want and where it is You know what you want to do Existing search engines are OK informational 59% You roughly know what you want but don’t know how to find it Difficult for existing search engines Opportunity
80. SparTag.us: Social Paragraph-level Tagging Joint work with Lichan Hong, Raluca Budiu, Les Nelson, Peter Pirolli Lichan Hong, Ed H. Chi, Raluca Budiu, Peter Pirolli, and Les Nelson. SparTag.us: A Low Cost Tagging System for Foraging of Web Content. In Proceedings of the Advanced Visual Interface (AVI2008), (to appear). ACM Press, 2008 . 2010-03-15 Ed H. Chi ASC Overview
81.
82.
83.
84. Duplicate Content via Paragraph Fingerprinting [Hong and Chi, CHI2009] 2010-03-15 Ed H. Chi ASC Overview
85. My Reading Notebook 2010-03-15 Ed H. Chi ASC Overview
86. Social Sharing friend’s tags my tags my highlights friend’s highlights 2010-03-15 Ed H. Chi ASC Overview
88. Experimental Evaluation: Significant Learning Gain N=18 SparTag.us + Friend superior to both individual conditions No difference between the two controls [Nelson et al., CHI2009] 2010-03-15 Ed H. Chi ASC Overview Without SparTag.us (WS) SparTag.us Only (SO) SparTag.us With A Friend (SF) SF group, M=0.46, SD=0.22 SO group, M=0.13, SD=0.32 WS group, M=0.27, SD=0.23
Editor's Notes
PARC FORUM *this week*: Thursday May 1, 4:00 – 5:00 pm, George E. Pake Auditorium at Palo Alto Research Center (www.parc.com/directions) TITLE: "Enhancing the Social Web through Augmented Social Cognition research" SPEAKER: Ed Chi, PARC Augmented Social Cognition group ABSTRACT: We are experiencing the new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by Wikipedia and del.icio.us, Web 2.0 environments are turning people into social information foragers and sharers. Users interact to resolve conflicts and jointly make sense of topic areas from “Obama vs. Clinton” to “Islam.” PARC‘s Augmented Social Cognition researchers -- who come from cognitive psychology, computer science, HCI, sociology, and other disciplines -- focus on understanding how to “enhance a group of people’s ability to remember, think, and reason”. Through Web 2.0 systems like social tagging, blogs, Wikis, and more, we can finally study, in detail, these types of enhancements on a very large scale. In this Forum, we summarize recent PARC work and early findings on: (1) how conflict and coordination have played out in Wikipedia, and how social transparency might affect reader trust; (2) how decreasing interaction costs might change participation in social tagging systems; and (3) how computation can help organize user-generated content and metadata. ABOUT THE SPEAKER: Ed H. Chi is a senior research scientist and area manager of PARC's Augmented Social Cognition group. His previous work includes understanding Information Scent (how users navigate and make sense of information environments like the Web), as well as developing information visualizations such as the "Spreadsheet for Visualization" (which allows users to explore data through a spreadsheet metaphor where each cell holds an entire data set with a full-fledged visualization). He has also worked on computational molecular biology, ubiquitous computing systems, and recommendation and personalized search engines. Ed has over 19 patents and has been conducting research on user interface software systems since 1993. He has been quoted in the Economist, Time Magazine, LA Times, Slate, and the Associated Press. Ed completed his B.S., M.S., and Ph.D. degrees from the University of Minnesota between 1992 and 1999. In his spare time, he is an avid Taekwondo black belt, photographer, and snowboarder. *************************************************** This is the final talk in our "Going Beyond Web 2.0" speaker series. Previous talks in this series, as well as other recent Forum talks, are available online at www.parc.com/forums. ************************************************** To subscribe to future PARC Forum announcements and/or our bimonthly e-newsletter, please visit: www.parc.com/subscriptions. To unsubscribe from Forum announcements, please send an e-mail to info@parc.com specifying the e-mail address you'd like to have removed.
This clip is from a comedy show, but it raises a serious question as well. What does happen when you have millions of people with different viewpoints all editing the same content? Well, you get a lot of conflict. I’m going to briefly go through an example of conflict that occurred on one of the most heavily edited pages in Wikipedia, which is, <pause>, you guessed it, about our own George W .
In the enterprise, these have become the standard set of Web 2.0 tools in practice. They have several benefits – they can be set up by end users without needing IT, they have familiar UIs from consumer versions, And in terms of knowledge sharing, an important advantage these tools have over traditional KM systems is that knowledge can be captured and archived through the act of communication without requiring extra work by users. These tools will become increasingly important in the office as younger people enter the workforce and expect to be able to use them.
Paste controversial tag picture here Figure depicting CRC
Selected a set of page metrics which we could scale to compute across large numbers of pages.
This graph is just running the model on the list of controversial topics, it is not x-validation. It’s R-square is actually 0.897.
This graph is just running the model on the list of controversial topics, it is not x-validation. It’s R-square is actually 0.897.
Especially interesting: unique editors DECREASE conflict. Anonymous edits are bad when on the discussion page but not the article page. Change to 1,2,3,4... and up/down arrows
1m
Year 2013, <10k new articles per month is expected to be added. Knowledge does not stop growing!
Therefore, this is the model that we suggest!
There are really two facets of tagging. The first is encoding: when you encounter a document, have read or skimmed it and have to generate a few words that describe it. The second side of tagging is retrieval: you find a new document that has several tags attached to it, and you read those tags and the document. The tags may give you an idea about what the document is about. I am going to come back to this distinction later.
Vocabulary saturation! shows a marked increase in the entropy of the tag distribution H(T) up until week 75 (mid-2005) at which point the entropy measure hits a plateau. Since the total number of tags keeps increasing, tag entropy can only stay constant in the plateau by having the tag probability distribution become less uniform. What this suggests is that users are having a hard time coming up with “unique” tags. That is to say, a user is more likely to add a tag to del.icio.us that is already popular in the system, than to add a tag that is relatively obscure.
What’s perhaps the most telling data of all is the entropy of documents conditional on tags, H(D|T) , which is increasing rapidly (see Figure 4). What this means is that, even after knowing completely the value of tags, the entropy of the document is still increasing. Conditional Entropy asks the question: “Given that I know a set of tags, how much uncertainty regarding the document set that I was referencing with those tags remains?” This measure gives us a method for analyzing how useful a set of tags is at describing a document set. The fact that this curve is strictly increasing suggests that the specificity of any given tag is decreasing. That is to say, as a navigation aid, tags are becoming harder and harder to use. We are moving closer and closer to the proverbial “needle in a haystack” where any single tag references too many documents to be considered useful.
Figure 6 shows the number of tags per bookmark over time. The trend is clearly increasing, complementing the increase in navigation difficulty.
In the enterprise, these have become the standard set of Web 2.0 tools in practice. They have several benefits – they can be set up by end users without needing IT, they have familiar UIs from consumer versions, And in terms of knowledge sharing, an important advantage these tools have over traditional KM systems is that knowledge can be captured and archived through the act of communication without requiring extra work by users. These tools will become increasingly important in the office as younger people enter the workforce and expect to be able to use them.
As I browse the web and annotate the pages, one of the things that SparTag.us automatically created for me is a notebook which contains all the paragraphs that I have annotated. Here it shows when I annotated this paragraph. Here is an option that allows me to make my annotations on this paragraph become private. Here are the URLs that I have visited and contain this paragraph. And I can search my notebook against the tags that I specified, the text that I highlighted, the text of the paragraphs that I annotated, or the URLs. By the way, this last one was suggested by Prateek who was a subject in our last user study. And here is a tag cloud which is really a representation of what kind of keywords I have using as tags.
Posing the right questions is half of the work.
Voting systems: faddishness of information, social dashboards Col info. Structures: explicit social networks Collaborative Co-creation
Voting systems: faddishness of information, social dashboards Col info. Structures: explicit social networks Collaborative creation
In other words, a person did not see both a high-trust and low-trust visualization for the same page.
Remember, our goal is not to see whether they noticed the visualization or not, but how much impact it could have.
So we ran two parts of the experiment, here are the combined results. Notice two things: Huge effect No significant interactions – trust was impacted Bi-directional change in trust: increase over baseline and decrease below baseline
Informational search – ambiguity in query – where social search has most power
What is the valuable problem addressed by this research program? What is the target (user, company, application, market), what is our place in the value chain, and what is the business model to bring value to the target and PARC?
As you can tell from my demo, what is being tagged are paragraphs. This is based on our intuition that although there are cases where it makes sense to tag the whole document, there are many other cases where the interesting nuggets of information are at the sub-document level, for example, entities, facts, concepts, and paragraphs. Our implementation focuses on paragraphs for now. The key idea is that we compute a unique fingerprint for each paragraph that we encounter. Currently, we use Secure Hash Algorithm to compute the paragraph fingerprint. We are exploring other ways in the future. This simple idea of paragraph fingerprint has also been picked up by other projects in UbiDocs.
Here is an example of duplicate content. Here we have a story at Forbes.com which is about the recent tragedy happening in Minnesota and I annotated part of the story. Here on a different web site, the same story appears and my annotations show up too.
As I browse the web and annotate the pages, one of the things that SparTag.us automatically created for me is a notebook which contains all the paragraphs that I have annotated. Here it shows when I annotated this paragraph. Here is an option that allows me to make my annotations on this paragraph become private. Here are the URLs that I have visited and contain this paragraph. And I can search my notebook against the tags that I specified, the text that I highlighted, the text of the paragraphs that I annotated, or the URLs. By the way, this last one was suggested by Prateek who was a subject in our last user study. And here is a tag cloud which is really a representation of what kind of keywords I have using as tags.
The way that we support social sharing is through a simple user interface like this. Here I designate myself as a fan of Ed, which means that I can see his annotations. When I go to this web page, I see that Ed has been here before and decided to leave some annotations. Of course, I can highlight or tag this paragraph too. Now, if I don’t want to be Ed’s fan anymore, I can remove his name from my friend list. And his annotations disappear too. And because this is done in AJAX, there is no need to reload the page.
A nice thing about SparTag.us is that when you come to a web page, it sort of tells you what may be interesting to pay attention to. Here it reminds me that these are two paragraphs that I have annotated. Here I see that Ed has annotated this paragraph.