• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Using Information Scent to Model Users in Web1.0 and Web2.0

Using Information Scent to Model Users in Web1.0 and Web2.0



This talk summarizes the work I have been doing on modeling user behavior on Web1.0 and Web2.0 systems in the last 13 years ...

This talk summarizes the work I have been doing on modeling user behavior on Web1.0 and Web2.0 systems in the last 13 years

Talk given at a workshop on Cognitive Modeling in Utrecht, Netherlands on March 20, 2010.



Total Views
Views on SlideShare
Embed Views



2 Embeds 4

http://www.slideshare.net 3
http://www.twylah.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Title: Modeling of Web Users from Web1.0 to Web2.0 Abstract: In this talk, I will provide a perspective on how information scent techniques have taken us to characterize and model individual web surfers in the Web1.0 world, and how we used those techniques to build applications and systems. Then I will present some ideas of we might bridge these ideas to the Web2.0 world by modeling groups of users using Web2.0 systems.
  • . Example: Media news is fresh. With the right interest, users have a high probability of following that piece of information. . Hunters strategies maximizes the benefit per cost of pursuing the prey. Information gatherers do exactly the same thing.
  • . Example: Media news is fresh. With the right interest, users have a high probability of following that piece of information. . Hunters strategies maximizes the benefit per cost of pursuing the prey. Information gatherers do exactly the same thing.
  • Statistically, a correlation coefficient above 0.8 is generally considered to be strong correlation, and between 0.5 and 0.8 is considered moderate, while below 0.5 is considered weak correlation . Twelve correlated strongly, and seventeen of the 32 tasks correlated moderately.
  • . Using our technology, by telling the web site of your special requirements, each virtual aisle of the web site is pre-highlighted according to your special request, making it easier for you to shop.
  • In the enterprise, these have become the standard set of Web 2.0 tools in practice. They have several benefits – they can be set up by end users without needing IT, they have familiar UIs from consumer versions, And in terms of knowledge sharing, an important advantage these tools have over traditional KM systems is that knowledge can be captured and archived through the act of communication without requiring extra work by users. These tools will become increasingly important in the office as younger people enter the workforce and expect to be able to use them.
  • There are really two facets of tagging. The first is encoding: when you encounter a document, have read or skimmed it and have to generate a few words that describe it. The second side of tagging is retrieval: you find a new document that has several tags attached to it, and you read those tags and the document. The tags may give you an idea about what the document is about. I am going to come back to this distinction later.
  • Vocabulary saturation! shows a marked increase in the entropy of the tag distribution H(T) up until week 75 (mid-2005) at which point the entropy measure hits a plateau. Since the total number of tags keeps increasing, tag entropy can only stay constant in the plateau by having the tag probability distribution become less uniform. What this suggests is that users are having a hard time coming up with “unique” tags. That is to say, a user is more likely to add a tag to del.icio.us that is already popular in the system, than to add a tag that is relatively obscure.
  • What’s perhaps the most telling data of all is the entropy of documents conditional on tags, H(D|T) , which is increasing rapidly (see Figure 4). What this means is that, even after knowing completely the value of tags, the entropy of the document is still increasing. Conditional Entropy asks the question: “Given that I know a set of tags, how much uncertainty regarding the document set that I was referencing with those tags remains?” This measure gives us a method for analyzing how useful a set of tags is at describing a document set. The fact that this curve is strictly increasing suggests that the specificity of any given tag is decreasing. That is to say, as a navigation aid, tags are becoming harder and harder to use. We are moving closer and closer to the proverbial “needle in a haystack” where any single tag references too many documents to be considered useful.
  • Figure 6 shows the number of tags per bookmark over time. The trend is clearly increasing, complementing the increase in navigation difficulty.
  • We introduce a technique for creating novel, textually-enhanced thumbnails of web pages. These thumbnails combine the advantages of image thumbnails and text summaries to provide consistent performance on a variety of tasks. We conducted a study in which participants used three different types of summaries (enhanced thumbnails, plain thumbnails, and text summaries) to search web pages to find several different types of information. Participants took an average of 83 seconds to find the answer to a question. They were approximately 30 seconds faster with enhanced thumbnails than with text summaries, and 19 seconds faster with enhanced thumbnails than with plain thumbnails. Further, performance with enhanced thumbnails was much more consistent than with text summaries or plain thumbnails. In the images shown on this slide, the top row contains plain (scale-reduced) thumbnails of web pages. The bottom row contains thumbnails that have been enhanced in the following way: (1) the fonts in H1 and H2 tags have been modified so that they are readable in the thumbnails; (2) transparent, highlighted callouts have been included for keywords from the search query (appropriate highlighted colors were chosen based on visual attention models); and (3) the contrast level in the thumbnail has been reduced so that the callouts are more prominent and readable.
  • Informational search – ambiguity in query – where social search has most power

Using Information Scent to Model Users in Web1.0 and Web2.0 Using Information Scent to Model Users in Web1.0 and Web2.0 Presentation Transcript

  • Modeling of Web Users from Web1.0 to Web2.0 Ed H. Chi, Principal Scientist and Area Manager Augmented Social Cognition Area Palo Alto Research Center Image from: http://www.flickr.com/photos/ourcommon/480538715/ 2010-03-20 Utrecht CogModeling
  • PARC Overview
    • Interdisciplinary research center
    • Founded in 1970
    • Spun out of Xerox in 2002
    • Business model:
      • Contract research
      • Licensing
      • Joint ventures
      • Spinoffs
    2010-03-20 Utrecht CogModeling
  • PARC Innovation
    • chartered to create the architecture of information & the office of the future
    • invented distributed personal computing
    • established Xerox’s laser printing business
    • created the foundation for the digital revolution
    Graphical User Interface Laser Printing Ethernet Bit-mapped Displays Distributed File Systems Page Description Languages First Commercial Mouse Object-oriented Programming WYSIWYG Editing Distributed Computing VLSI Design Methodologies Optical Storage Client/Server Architecture Device Independent Imaging Cedar Programming Language 2010-03-20 Utrecht CogModeling
  • How do people navigate?
    • Scan
    • Skim
    • Decide
    • Action
    Utrecht CogModeling 2010-03-20
  • Ecological Approach
    • human-information interaction is adaptive to the extent:
    2010-03-20 Utrecht CogModeling Net Knowledge Gained Costs of Interaction MAXIMIZE [ ]
  • Analogy to Optimal Foraging 2010-03-20 Utrecht CogModeling Information Energy
  • Information Scent: The Theory
    • Information Scent is the user perception of the cost and value of information.
      • Similar to hunters following animal foot prints.
    2010-03-20 Utrecht CogModeling
  • Information Scent: The Idea
    • Spreading activation
      • Bayesian prediction of relevance of individual elements
    cell patient dose beam new medical treatments procedures Information Need Text snippet Sees Wants 2010-03-20 Utrecht CogModeling
  • 2010-03-20 Utrecht CogModeling i bread j butter sandwich flour Ai = Bi +  WjSji Activation of chunk i Base-level activation of chunk i Activation spread from linked chunks j Activation depends on a base level plus activation spread from associated chunks Bi = log( ) Pr( i ) Pr(not i ) Sji = log( ) Pr( j | i ) Pr( j |not i ) log likelihood of i occurring log likelihood of i occurring with j Base level activation reflects log likelihood of events in the world. Strength of spread reflects log likelihood of event cooccurrance
  • Attacking The Problem
    • Users have information goals, their surfing patterns are guided by information scent
    • Two questions
      • Given an information goal and a starting point
        • Where do users go? (Behavior)
      • Given some surfing pattern
        • What is the user’s goal? (Need)
    2010-03-20 Utrecht CogModeling
  • WUFIS: Web User Flow by Information Scent 2010-03-20 Utrecht CogModeling User Information Goal Web site Web Page content links Web user flow simulation Predicted paths
  • InfoScent: How does it work? Utrecht CogModeling Start users at page with some goal Flow users through the network Examine user patterns Scent Values: Probabilities of Transition 2010-03-20
  • InfoScent Simulation Utrecht CogModeling 2 1 2010-03-20 Now with the Scent Matrix, we then perform Spreading Activation. 3 Weight Matrix Query Relevant Docs R = Relevant documents T = Topology matrix Normalize to Probability Scent Matrix
  • Proximal Cue Words
    • Goal: Find words that represent Information Cues for hyperlinks:
    Utrecht CogModeling Text of the link itself
    • Words around link.
      • Lists, Paragraphs
    1 2 2010-03-20
  • Information Cues
    • If the above two fails,
      • Content words on the Distal Page
      • Title Words of the Distal Page
    Utrecht CogModeling 3 2010-03-20
  • Bloodhound Project 2010-03-20 Utrecht CogModeling Starting Point: www.xerox.com Task: look for “ high end copiers ” OUTPUT usability metrics INPUT
  • Input Tasks 2010-03-20 Utrecht CogModeling
  • Stanford CS 2010-03-20 Utrecht CogModeling
  • ONR 2010-03-20 Utrecht CogModeling
  • Instrumentation: WebLogger 2010-03-20 Utrecht CogModeling
  • User Traces 2010-03-20 Utrecht CogModeling
  • Compare Visitation Distributions
    • For each task, produce a user summary vector that describes the frequency distribution of page visit over the document space.
    • For each task, ran Bloodhound and create bloodhound predicted frequency distribution.
    2010-03-20 Utrecht CogModeling
  • Results 2010-03-20 Utrecht CogModeling
    • Produced click streams that:
    • Correlated strongly 1/3 of the time
    • Moderately slightly less than 2/3 of the time
      • Problem: we do not know a priori which third.
    Corr. Coeff. Yahoo REI HivInSite Parcweb task 1a 0.7528 0.4701 0.6811 0.7394 task 1b 0.7218 0.4763 0.7885 0.8756 task 2a 0.7489 0.9892 0.6671 0.8930 task 2b 0.8840 0.7073 0.6880 0.8573 task 3a 0.7768 0.7321 0.8835 0.7197 task 3b 0.6973 0.6979 0.5660 0.7123 task 4a 0.9022 0.9415 0.8407 0.8340 task 4b 0.9052 0.7600 0.4634 0.9344
  • IUNIS: Inferring User Need by Info Scent 2010-03-20 Utrecht CogModeling User Information Goal Web site Web Page content links Web user flow simulation Observed paths
  • 2010-03-20 Utrecht CogModeling
  • Evaluation of IUNIS
    • Procedure:
      • 10 Path booklets
      • Single rating sheet with the ten 20-word summaries. A copy of this rating sheet is attached to each of the 10 path booklets.
      • Users are asked to read through each booklet and rate each of the path summaries.
        • Each summary, 5-point Likert Scale.
        • Which of the ten summaries was the best match.
    Utrecht CogModeling 2010-03-20
  • Evaluation of IUNIS
    • Results:
      • Matching summary mean = 4.58 (median=5)
      • Non-matching summary mean = 1.97 (median=1)
      • Difference highly significant (p < .001)
      • Best match summary: 5.6 out of 10 (Cohen Kappa=0.51)
    • Evaluation yield strong evidence that IUNIS generates good summaries of the Web paths.
    Utrecht CogModeling 2010-03-20
  • ScentTrails: Pre-highlight navigation path
    • A store that knows your goal.
    • Over 50% reduction in task time.
    Utrecht CogModeling 2010-03-20
  • Web page with highlighted link anchors 2010-03-20 Partial information goal: “ remote diagnostic technology” Remainder of information goal: “ speed >= 75” Utrecht CogModeling 62 copies/min. 92 copies/min.
  • ScentTrails algorithm
    • Identify tasty pages
    • Waft scent backward along links
      • Loses intensity as it travels
    2010-03-20 Utrecht CogModeling remote diagnostics copiers fax machines other maintenance . . . XC4411 XC5001 XC4411 copier features Features: remote diagnostics . . . digital copiers color copiers back
  • Results of user study Utrecht CogModeling (times capped at five minutes) 10/12 subjects preferred ScentTrails to both searching and browsing 2010-03-20
  • ScentIndex 2010-03-20 Utrecht CogModeling Exact Matches in red Associated Entries underlined in red
  • ScentHighlight User first type search keywords: “anthrax symptoms” Conceptually highlight any relevant passages and keywords Draw user attention 2010-03-20 Utrecht CogModeling
  • Method 2010-03-20 Utrecht CogModeling
  • User Study Summary
    • Overall, the ScentIndex eBook performed better against the physical Book.
    • Faster Speed:
      • Subjects using the ScentIndex were faster in completing their tasks no matter whether they were experts or novices, F(1,12)=12.96, p<.01.
    • More Accurate:
      • Answers that they provided while using ScentIndex interface were more accurate, F(1,12)=3.991, p=.06.
    2010-03-20 Utrecht CogModeling
  • Heuristics 2010-03-20 Utrecht CogModeling Poor heuristic Good heuristic
  • “ Hints” 2010-03-20 Utrecht CogModeling Solo Cooperative (“good hints”)
  • Finding a Restaurant
    • Appropriate for the occasion
    2010-03-20 Utrecht CogModeling
  • Research Vision Augmented Social Cognition
    • Cognition : the ability to remember, think, and reason; the faculty of knowing.
    • Social Cognition : the ability of a group to remember, think, and reason; the construction of knowledge structures by a group.
      • (not quite the same as in the branch of psychology that studies the cognitive processes involved in social interaction, though included)
    • Augmented Social Cognition : Supported by systems, the enhancement of the ability of a group to remember, think, and reason; the system-supported construction of knowledge structures by a group.
    • Citation: Chi, IEEE Computer, Sept 2008
    2010-03-20 Utrecht CogModeling
  • Research Methodology
    • Characterize activity on social systems with analytics
    • Model interaction social and community dynamics and variables
    • Prototype tools to increase benefits or reduce cost
    • Evaluate prototypes via Living Laboratories with real users
    Utrecht CogModeling 2010-03-20 Characterization Models Prototypes Evaluations
  • 2010-03-20 Utrecht CogModeling Characterization Models Prototypes Evaluations
  • Two Sides of Tagging
    • Encoding
    • Retrieval
    http://edge.org “ science research cognition” http://www.ted.com/index.php/speakers “ video people talks technology” 2010-03-20 Utrecht CogModeling
  • Using Information Theory to Model Social Tagging [Ed H. Chi, Todd Mytkowicz, ACM Hypertext 2008] Topics Users Documents Decoding 2010-03-20 Utrecht CogModeling Concepts Tags T 1 …T n Encoding Noise
  • H(Tag) shows saturation in tag usage 2010-03-20 Utrecht CogModeling
  • H(Doc | Tag), browsability 2010-03-20 Utrecht CogModeling
  • I ( Doc ; Tag ) Mutual Information 2010-03-20 Utrecht CogModeling Source: Hypertext 2008 study on del.icio.us (Chi & Mytkowicz)
  • Raise in avg. tag per bookmark (note parallel the development in increasing # of query words) 2010-03-20 Utrecht CogModeling
  • 2010-03-20 Utrecht CogModeling Characterization Models Prototypes Evaluations
    • Synonyms
    • Misspellings
    • Morphologies
    • People use different tag words to express similar concepts.
    Social Tagging Creates Noise 2010-03-20 Utrecht CogModeling
  • TagSearch: Use Semantic Analysis to Reduce Noise http://mrtaggy.com 2010-03-20 Utrecht CogModeling Guide Web Howto Tips Help Tools Tip Tricks Tutorial Tutorials Reference Semantic Similarity Graph
  • MapReduce Implementation
    • Spreading Activation in a bi-graph
    • Computation over a very large data set
      • 150 Million+ bookmarks
    2010-03-20 Utrecht CogModeling Tags URLs P(URL|Tag) P(Tag|URL)
  • Understanding a new area… 2010-03-20 Characterization Models Prototypes Evaluations Utrecht CogModeling
  • MrTaggy.com: social search browser with social bookmarks Joint work with Rowan Nairn, Lawrence Lee Kammerer, Y., Nairn, R., Pirolli, P., and Chi, E. H. 2009. Signpost from the masses: learning effects in an exploratory social tag search browser. In Proceedings of the 27th international Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 625-634. 2010-03-20 Utrecht CogModeling
  • 2010-03-20 Utrecht CogModeling
  • TagSearch Architecture
    • MapReduce: months of computation to a single day
    • Development of novel scoring function
    2010-03-20 Utrecht CogModeling
  • Understanding a new area… 2010-03-20 Characterization Models Prototypes Evaluations Utrecht CogModeling
  • Baseline Interface 2010-03-20 Utrecht CogModeling
  • Experiment Design
    • 2 interface x 3 task domain design
      • 2 Interface (between-subjects)
        • Exploratory vs. Baseline
      • 3 task domains (within-subjects)
        • Future Architecture, Global Warming, Web Mashups
    • 30 Subjects (22 male, 8 female)
      • Intermediate or advanced computer and web search skills
      • Half assigned Exploratory, half Baseline.
    • For each domain, single block with 3 task types:
      • Easy and Difficult Page Collection Task [6min each]
      • Summarization Task [12min]
      • Keyword Generation Task [2min]
    2010-03-20 Utrecht CogModeling
  • Procedure [2 hours]
    • Prior Knowledge Test
    • 1 st Task Domain
      • With easy and difficult page collection tasks, summarization and keyword generation task.
      • NASA cognitive load questionnaire
    • 2 nd Task Domain
      • Same battery of tasks and cognitive load questionaire
    • 3 rd Task Domain
    • Experimental Survey
    2010-03-20 Utrecht CogModeling
  • Experimental Evauation [Kammerer et al, CHI2009]
    • Exploratory interface users:
      • performed more queries,
      • took more time,
      • wrote better summaries (in 2/3 domains),
      • generated more relevant keywords (in 2/3 domains), and
      • had a higher cognitive load.
    • Suggestive of deeper engagement and better learning.
    • Some evidence of scaffolding for novices in the keyword generation and summarization tasks.
    2010-03-20 Utrecht CogModeling
  • The Team 2010-03-20 Utrecht CogModeling
  • Augmented Social Cognition: From Social Foraging to Social Sensemaking Image from: http://www.flickr.com/photos/ourcommon/480538715/
    • Research Vision: Understand how social computing systems can enhance the ability of a group of people to remember, think, and reason.
    • Living Laboratory: Create applications that harness collective intelligence to improve knowledge capture, transfer, and discovery.
    • http://asc-parc.blogspot.com
    • http://www.edchi.net
    • [email_address]
    2010-03-20 Utrecht CogModeling
  • 2010-03-20 Utrecht CogModeling
  • Enhanced Thumbnails Andrew Faulring, Allison Woodruff and Ruth Rosenholtz 2010-03-20 Utrecht CogModeling   enhanced plain
  • Popout Prism [ Suh &Woodruff] 2010-03-20 Utrecht CogModeling
  • Social Search Survey [Brynn Evans, Ed H. Chi, CSCW2008]
    • Help understand the importance of:
      • social cues and information exchanges
      • vocabulary problems
      • distribution and organization
    2010-03-20 Utrecht CogModeling
  • TagSearch Exploratory Focus 3 kinds of search 2010-03-20 Utrecht CogModeling navigational transactional 28% 13% You know what you want and where it is You know what you want to do Existing search engines are OK informational 59% You roughly know what you want but don’t know how to find it Difficult for existing search engines Opportunity