Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Collecting a dataset of information behaviour in context

898 views

Published on

We collected human-computer interaction data (keystrokes, active applications, typed text, etc.) from knowledge workers in the context of writing reports and preparing presentations. This has resulted in an interesting dataset that can be used for different types of information retrieval and information seeking research. The details of the dataset are presented in this paper.

Published in: Science, Technology, Business
  • Be the first to comment

  • Be the first to like this

Collecting a dataset of information behaviour in context

  1. 1. http://www.swell-project.net Collecting a dataset of information behaviour in context Maya Sappelli, TNO & Radboud University Nijmegen Suzan Verberne, Radboud University Nijmegen Saskia Koldijk, TNO & Radboud University Nijmegen Wessel Kraaij, TNO & Radboud University Nijmegen Supported by the Dutch National Program:
  2. 2. http://www.swell-project.net Information behaviour in context Supported by the Dutch National Program: 2 / 15
  3. 3. http://www.swell-project.net Information behaviour in context Supported by the Dutch National Program: 3 / 15
  4. 4. http://www.swell-project.net But… • Controlled Search Experiment Lacks context for search Unnatural motive/behaviour • Uncontrolled Data Collection Privacy issues Noise Supported by the Dutch National Program: 4 / 15
  5. 5. http://www.swell-project.net Data Collection Supported by the Dutch National Program: 5 / 15
  6. 6. http://www.swell-project.net Data Collection Supported by the Dutch National Program: 6 / 15
  7. 7. http://www.swell-project.net Data Labeling: Event Stream to Event Blocks Supported by the Dutch National Program: 7 / 15 Event Block e yOutlook A nH Inbox
  8. 8. http://www.swell-project.net Data Labeling: presenting Event Blocks • Mechanical Turk • 9416 event blocks • Cohen’s kappa 0.78 Supported by the Dutch National Program: 8 / 15
  9. 9. http://www.swell-project.net Data Labeling: result Supported by the Dutch National Program: 9 / 15 Distribution of Labels Einstein Information Overload Stress Healthy Privacy Perth Roadtrip Napoleon Indeterminable No Label Total no. Event blocks 9416 Average no. Event blocks per participant 377
  10. 10. http://www.swell-project.net Examples of analyses with the data • Stress-related behavioural research • Information-related behavioural research 1. system-oriented 2. user-oriented Supported by the Dutch National Program: 10 / 15
  11. 11. http://www.swell-project.net System-oriented analysis Supported by the Dutch National Program: 11 / 15 Total # Queries: 980 Of which followed by a click on a URL: 732 Of which followed by a switch to Word/ Powerpoint: 125 Of which with Ctrl+C: 15 with a dwell- time of >=30 seconds: 44
  12. 12. http://www.swell-project.net User-oriented analysis Supported by the Dutch National Program: 12 / 15
  13. 13. http://www.swell-project.net Discussion: challenges • Combining data from multiple sources is not trivial • Incomplete queries logged due to Google’s query suggestions • Clicks without change in Window title (esp. Google Images) • Noise from browser logging Supported by the Dutch National Program: 13 / 15
  14. 14. http://www.swell-project.net Conclusions • Dataset of information behaviour of knowledge workers • Main contributions of the dataset: 1. Combination of data types 2. Natural information seeking behaviour 3. In-context recordings Supported by the Dutch National Program: 14 / 15

×