Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Leyline: A provenance-based desktop search

171 views

Published on

The most effective strategy for finding files is to carefully arrange them into folders. This strategy breaks down for teams, where organizational schemes often differ between team members. It also breaks down when information is copied and reused as it becomes harder to track versions. As storage continues to grow and costs decline, the incentives to carefully archive old versions of files diminish. It is therefore important to explore new and improved search tools. The most common approach is keyword search, though recalling effective keywords can be challenging, especially as repositories grow and information flows across projects. A less common alternative is to use provenance --information about the creation, use and sharing of documents and their context, including collaborators. This paper presents a limited user study showing that provenance data is useful and desirable in search, and that an interface based on a graphical sketchpad is not only feasible, but efficient.

  • Be the first to comment

  • Be the first to like this

Leyline: A provenance-based desktop search

  1. 1. The Leyline: A ComparativeApproach To Designing a Graphical Provenance-Based Search UI Soroush Ghorashi, Carlos Jensen Oregon State University HICSS 2013
  2. 2. What is the problem? Computers are increasingly “black holes” for information—  Storage abundant and cheap, no incentives to delete or archive—  Collaboration and sharing are growing—  Information increasingly flowing across devices
  3. 3. What is the problem? Computers are increasingly “black holes” for information—  Storage abundant and cheap, no incentives to delete or archive—  Collaboration and sharing are growing—  Information increasingly flowing across devices More information available, harder to (re)find anything
  4. 4. What is the problem? Computers are increasingly “black holes” for information—  Storage abundant and cheap, no incentives to delete or archive—  Collaboration and sharing are growing—  Information increasingly flowing across devices More information available, harder to (re)find anythingManual Folder Navigation [Barreau, D. and Nardi 1995, Teevan et. al 2004, Bergman et. al 2008]—  Collaborators use conflicting name schemes—  Overlapping projects introduce uncertainty
  5. 5. What is the problem? Computers are increasingly “black holes” for information—  Storage abundant and cheap, no incentives to delete or archive—  Collaboration and sharing are growing—  Information increasingly flowing across devices More information available, harder to (re)find anythingManual Folder Navigation [Barreau, D. and Nardi 1995, Teevan et. al 2004, Bergman et. al 2008]—  Using conflicting name scheme by collaborators—  Overlapping projects introduce uncertaintyKeyword Search—  Having larger repositories and information reuse lead to long list of hits for common keywords—  Multiple Copies and drafts of files
  6. 6. Solution? What about: “Leveraging provenance to enrich file search”—  Provenance: The history of a document’s ownership, transformations, as well as sources and derivatives att ac hm e en ast RE: presentation draft ts av y /p data.html e cop sav presentation.ppt e as presentation-v2.ppt—  Track provenance events: Make available in search queries, use in results presentation—  Allow for fundamentally different types of queries—  People remember related documents [Gonçalves , 2004; Blanc-Brude, 2007]
  7. 7. Research Goals—  Phase 1: Analyze information reuse, information flow, and provenance events in a real-world settings—  Phase 2: Investigate the effectiveness of provenance cues in desktop search—  Phase 3: Develop and evaluate provenance-based search tools (if appropriate)
  8. 8. Phase 1: Study Real-World Work Practices (2008/2010) File use per person-day3 month user study at Intel Corporation Web* 89.9 —  Logging subjects’ activities on their computers Email 73.7 —  Data cleaned for personal and sensitive information Word 4.4 Excel 2.5 —  Recorded provenance and information access events PowerPoint 2.1 Text 0.4 —  Participants PDF 0.2 Total 173.2 —  17 information workers, 43 workdays average —  9 observation sessions DownloadFile 3% FileRename —  Exit interview with test 5% MoveFile 6% —  Findings SaveAs 15% —  126,620 unique resources CopyPaste 63% —  7,448 resources per subject UploadFile 2% AttachmentAdd —  Min: 3,211; Max: 17,570; σ: 3,326 3% AttachmentSave 3%C. Jensen et al., "The life and times of files and information: a study of desktop provenance." In Proceedings of the 28th internationalConference on Human Factors in Computing Systems (Atlanta, GA, April 10 - 15, 2010). CHI 10. ACM, New York, NY, pp. 767-776.
  9. 9. Phase 1 contd. Provenance networks are more common than we expected! —  521 significant graphs (3+ nodes) —  Average 5.8 resources per graph —  53.7% of files related to at least one other file in their own networkC. Jensen et al., "The life and times of files and information: a study of desktop provenance." In Proceedings of the 28th internationalConference on Human Factors in Computing Systems (Atlanta, GA, April 10 - 15, 2010). CHI 10. ACM, New York, NY, pp. 767-776.
  10. 10. Phase 1 contd. “It looks like it comes from the IAP tool, and all the green boxes are “I recall uploading my Excel spreadsheets that I exported to. The word documents are those to the probably what I copied the Excel data to, probably for email.” SharePoint site!” “Oh, I see what’s going Half of subjects remembered on. I tend to open a “2.4 might have been spreadsheet and more about their documents embedded in a doc, so I sometimes I’ll have more had to copy it out fromthan one open at the same after seeing a provenance there.” time…” graph.“Yeah, that’s what I did, I turned it into Excel… I saved it, “Looks like I copied and pasted from the website into and then I changed the name because I wanted to make a doc… It’s kind of complicated what I did here. I sure it was distinguished from other files I have with the took 2.2, copied and pasted info into an Excel same name for a different group.” spreadsheet. And then yeah, there’s number 7, a spreadsheet as well.” C. Jensen et al., "The life and times of files and information: a study of desktop provenance." In Proceedings of the 28th international Conference on Human Factors in Computing Systems (Atlanta, GA, April 10 - 15, 2010). CHI 10. ACM, New York, NY, pp. 767-776.
  11. 11. Can We Use Provenance More Directly?Textual query in most traditionalkeyword search tools
  12. 12. Can We Use Provenance More Directly?Textual query in most traditionalkeyword search tools What about drawing queries?
  13. 13. Phase 2: Provenance in Search? Is it Appropriate? Can provenance be used effectively in search?—  How complex a query do we need to find a file?—  List of all unique walks in provenance graphs —  Find longest repeating strings for each subject —  Worst case unique query: Longest repeating string + 1 —  With/without provenance event type to examine impact Outlook--AS--Word--CP--PowerPoint--SA--PowerPoint--CP--Powerpoint
  14. 14. Phase 2 contd.—  Maximum query length for a repository of ~7500 items: —  Considering the type of provenance events —  3 to 9, median 4 —  Without considering the type of provenance events —  3 to 10, median 4.5 Provenance events like copy/paste and versioning are too common to add value!—  Provenance search grows linearly —  1 node per 200 links Provenance can be used to narrow search space quickly. 
  15. 15. Tool Analysis Categorizing tools that are using provenance-like data to enhance search—  Provenance Types—  Provenance Monitoring—  Provenance Use—  UI Approach—  Evaluation
  16. 16. Tool Analysis contd.Name Provenance Types Provenance Monitoring Provenance Use UI Approach Evaluation File meta-data, Extracting relations from Query formulation, Flow-chart like, Canned data, keyword, static Google Desktop’s database Search process List view model limited withinFeldspar relations between using its API (real-time results subjects user resources updating) study Meta-data such as Built-in System Monitor to Query formulation, Narrative-based, Multiple user author, storage place, record meta-data about the Search process List of resources’ studiesQuill date, physical place user’s documents, email thumbnails (real- tag (home, work, attachments, WebPages, time results etc.) applications and calendar updating) File meta-data (such Microsoft Desktop Search Query formulation, Text input with Longitudinal as kind, date, author, database, fuzzy matching (car Search process, selectable filters, study using real email attributes) and cars are same), fielded Results List view of data on subjects’SIS search (author is “john doe”) presentation results with a PCs (234 people), preview and 6 weeks meta-data File meta-data (such Microsoft Desktop Search Query formation, Text input with Longitudinal as kind, date, author, database, Extra meta-data as Search process, selectable filters, study using real email attributes). tags (Labeling system) Results List view of data on subjects’Phlat Contextual cues such presentation results with a PCs (225 people), as user defined tags preview and 8 months meta-data Environmental Integrated system monitor to Query formulation, Textual input and Canned data, factors as contextual record contextual cues and Search Process selectable filters, limited withinYouPivot cues, user defined their occurrences List view of subjects user marks results study
  17. 17. Tool Analysis Feldspar—  Feldspar – Chau et. al 2008 —  Desktop search —  Uses associations between files and resources —  extracted from Google Desktop database —  Keyword and meta-data search —  Flowchart-like user interface —  Real-time results, fast —  Evaluated with canned data —  Within subject study
  18. 18. Tool Analysis Stuff I’ve seen, Phlat—  Stuff I’ve Seen (SIS) – Dumais et. al 2003, Phlat – Cutrell et. al 2006 —  Similar to Windows Desktop Search —  Keyword and meta-data search —  Ranks the results using contextual cues —  Textual input —  List view of results with snippet and meta-data —  Unified labeling (Phlat) —  Longitudinal study
  19. 19. Tool Analysis YouPivot—  YouPivot – Hailpern et. al 2011 —  Search web browsing history —  Internal system monitor —  Uses keyword for search and contextual cues to filter the results —  Timeline view for user activities —  Textual input, list view of results —  TimeMarks to filter the results —  Evaluated with canned data —  Within subject study
  20. 20. Phase 3: Design Goals—  Use dynamic relations between files—  Integration with keyword search—  Graphical UI—  Allowing all kinds of graphical queries—  Internal system monitor—  Result exploration
  21. 21. Phase 3: System Requirements—  Provenance + Keyword search—  Streamline query composition using a drag-drop graphical sketchpad—  Allow for flexible exploration and discovery—  Integration with Windows Explorer to allow exploration of workflow and information provenance
  22. 22. Phase 3 contd. Exact pattern matching problem is np-complete! (sub-graph isomorphism problem)—  Introducing * links
  23. 23. Phase 3 contd. Exact pattern matching problem is np-complete! (sub-graph isomorphism problem)—  Introducing * links
  24. 24. Phase 3 contd. Exact pattern matching problem is np-complete! (sub-graph isomorphism problem)—  Introducing * links —  Partial matching —  Easier to solve —  Better matches user recall—  Use G-Ray algorithm [Tong et al. 2007] —  Best-effort matching —  Fast, scalable, flexible and forgiving
  25. 25. Phase 3: The Leyline
  26. 26. Phase 3: Preliminary Evaluation Is UI approach reasonable? —  User Study —  Used file repository modeled after those found at Intel —  Participant selection —  Questionnaire to examine knowledge of search tools —  Graduate students —  Interactive tutorial —  9 Experiment tasks “Find the word document you created using information copy/pasted from an email, a web page, and an excel document. Find the emails that have this word document as an attachment.” —  Tasks ordered randomly —  Think aloud protocol —  4 minutes for each tasks —  Exit interview about their experienceS. Ghorashi, C. Jensen, “Leyline: provenance-based search using a graphical sketchpad”, In Proceedings of the 6th Symposium on Human-Computer Interaction and Information Retrieval (HCIR12). ACM, New York, NY, USA, Article 2 , 10 pages.
  27. 27. Phase 3: Preliminary Evaluation contd. —  Average completion time: 106 seconds —  Simple tasks (72 seconds – 93 seconds) —  Hard tasks (126 seconds – 155 seconds) —  Query complexity (#nodes & #edges) —  Average of 2.8 nodes and 2 edges —  System scales well (Completion time vs. Complexity) —  Observations —  Importance of target document —  Working on one resource or relation at a time —  Saw marked learning effect —  Interviews —  Overall likability rating: 4.2 out of 5 (σ = 0.4) —  Wanted Leyline in real life —  No one complained about effort/time requirement —  Areas for improvement —  Query composition history panel —  Customization options —  Support more resource typesS. Ghorashi, C. Jensen, “Leyline: provenance-based search using a graphical sketchpad”, In Proceedings of the 6th Symposium on Human-Computer Interaction and Information Retrieval (HCIR12). ACM, New York, NY, USA, Article 2 , 10 pages.
  28. 28. Conclusion—  Provenance events are very common in real-world settings, and potentially helpful in search—  Provenance alone can quickly and effectively identify unique files/resources (assuming perfect recall)—  A graphical sketchpad is a viable UI for query composition —  Isn’t going to replace keyword search, but valuable addition—  Users quickly learned how to use our system, and wanted the tool
  29. 29. What about the future?—  Incorporate the feedback and lessons learned into a new prototype—  Expand feature set to include: —  Auto-completion and suggestion features to speed up the search process —  Support a broader set of files and resources —  Possibly support other computer platforms—  Prepare for longitudinal study —  How do people adapt and use the Leyline? —  How does the Leyline scale in a large database? —  Does the Leyline change exploration? —  Does the Leyline work in collaborative environment?
  30. 30. Thank you—  Thanks to Intel for early funding and subjects!—  For more information: —  Soroush Ghorashi —  (ghorashi@eecs.oregonstate.edu) —  Carlos Jensen —  (cjensen@eecs.oregonstate.edu)

×