Phlat Ph ast &  ph lexible personal search and organization Ed Cutrell Microsoft Research Adaptive Systems & Interaction G...
Outline <ul><li>Personal search today </li></ul><ul><li>Search with Phlat </li></ul><ul><ul><li>With: Susan Dumais, Daniel...
Search in 2004 … <ul><li>Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, notes...
Stuff I’ve Seen <ul><li>In 2001, we built SIS </li></ul><ul><ul><li>Deployed to ~3000 people in MSFT </li></ul></ul><ul><u...
Search Today <ul><li>Desktop search hits the bigtime! </li></ul><ul><ul><li>MSN Toolbar—Windows Desktop Search </li></ul><...
Simple UI on Rich Client
Search with Phlat <ul><li>Phlat asks: Can we design an intuitive and flexible interface that replaces the traditional sear...
Faceted property filtering <ul><li>5 canonical properties to filter on (extensible) </li></ul><ul><li>Prop filters integra...
Tagging <ul><li>Apply single set of user-generated metadata to all files, email, etc. </li></ul><ul><li>Allow  but not  re...
Demo
Phlat Demo Points <ul><li>Full-text & property search rendered similarly—no stuck filters, browse/search continuum </li></...
Evaluating Phlat <ul><li>Internal deployment </li></ul><ul><ul><li>~500 downloads </li></ul></ul><ul><ul><li>Users include...
User Logging <ul><li>Phlat records a broad range of query characteristics to help us understand the characteristics of usa...
Phlat Observations 1 (unified access) <ul><li>Metadata quality is variable </li></ul><ul><ul><li>Email: rich, pretty clean...
Phlat Observations 2 (tricky business) <ul><li>Internet cache & web history are very useful but tricky areas </li></ul><ul...
Timeline w/ Landmarks <ul><li>Timeline interface  </li></ul><ul><ul><li>(with Merrie Ringel & Eric Horvitz) </li></ul></ul...
SIS,  Timeline w/ Landmarks Search Results <ul><li>Memory Landmarks </li></ul><ul><li>General (world, calendar) </li></ul>...
Demo
Timeline Experiment With Landmarks Without Landmarks Dates Only Landmarks + Dates 0 5 10 15 20 25 30 Search Time (s)
Contextualizing Search <ul><li>Search is not the end goal …  </li></ul><ul><li>Finding: </li></ul><ul><li>Always available...
Implicit Queries Background search on top  k  interesting terms from message, based on user’s index — Score = tf doc  / lo...
IQ Demo
Learn More <ul><li>Phlat is usable NOW!   http://phlat   </li></ul><ul><li>Design goes well beyond the prototype. Feel fre...
Thank You!
 
SIS Usage Data <ul><li>Detailed analysis for 234 people, 6 weeks usage </li></ul><ul><li>Personal store characteristics </...
SIS Usage Data, cont’d <ul><li>Characteristics of items opened </li></ul><ul><li>File types opened </li></ul><ul><ul><li>7...
Upcoming SlideShare
Loading in …5
×

Phlat Phast

439 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
439
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Phlat: Phast &amp; phlexible personal search &amp; organization We are quickly approaching the time when each of us will have terabytes of personal electronic information: communications, documents, pictures, music, video, web history, and more. I can’t even keep up with filing my vacation pictures! How am I supposed to find anything in this ocean of content, metadata and associations? To successfully navigate these deep waters, we need flexible interfaces that map onto the associative nature of human memory; a search box just isn’t enough. We have built several interfaces that exploit the rich faceted metadata associated with personal information. These interfaces allow for intuitive navigation of huge data stores in a manner better suited to the quirks of human memory. Phlat is an interface for combining structured and unstructured desktop search and explores techniques for user tagging of personal objects. Best of all, Phlat runs on top of Windows Desktop Search, so you can try it for yourself! Memory Landmarks explores how to leverage the fact that human memory is based on events, not pure time. And finally, Implicit Query (IQ) is an attempt to contextualize search—the system searches your store for you based on what you’re doing! Try Phlat now at http:// phlat to see it for yourself!
  • Many entry points: Today, there are many search entry points (and organizational systems): web, outlook, file, help To find things, you need to remember where you saw them Slow: Can find things in &lt;1 sec. on web, but it often takes minutes on your own machine Disparate interfaces and content, and Slow -&gt; confusing and frustrating
  • Everybody’s doin’ it! Reuse is important: 70-80% of seen web pages are ‘revisits’ (similar statistics for access to library books, files, cached content, and human memory retrieval)
  • These techniques allow us to get a broad sampling of user activities in the context of natural use of the system Real usage contexts Multiplicity of methods
  • ***** Add Bayesian Network for page memorability *****
  • Time: 24.3, 18.4, 24% Interact 2003 paper …
  • Context – task, user (personalization) Finding / Using … Levels (desktop, intranet, web) Algs and UI
  • &lt;&lt;Note on index size&gt;&gt; 3k web pages; 20k files; 50k mail messages 10s of gig of content -&gt; 185 meg index Changes at rate of 1-2 meg / week Short queries suffice – size of store, quick iteration, lots of cues QTypes People/places (0.29) Computers/internet (0.25) Science/history (0.16) -&gt; (0.70)+(0.20 unknown) SIS Index Size *** Fri, Sept 26 &lt;number&gt; files: 19302 mail: 45847 web: 1011 total: 66160 (sum, so some duplication w/ attachments) totalsis: 61747? &lt;file size&gt; pst: 1.28gig exchange: 102meg c:my documents: 1.73gig d:papers: 1.67gig d:sigir-chair: 600meg total: 5.38gig &lt;index size&gt; C:Program FilesRSSearchDataApplicationsRSAppProjectsMyIndexBuildIndexerCiFiles index: 191 meg *** Fri Oct 3 &lt;number&gt; files: 19562 mail: 46629 web: 1372 total: 67563 (sum) totalsis: 64179 &lt;file size&gt; pst: 1.31gig exchange: 88meg c:my documents: 1.73gig d:papers: 1.69gig d:sigir-chair: 606meg total: 5.42gig &lt;index size&gt; C:Program index: 192meg, plus some other ci files *** Fri Oct 10 &lt;number&gt; files: 19729 mail: 46599 web: 1662 total: 67990 &lt;file size&gt; pst: 1.32gig exchange: 82meg c:my documents: 1.73gig d:papers: 1.70gig d:sigir-chair: 606meg total: 5.44gig &lt;index size&gt; C:Program index: 194meg, plus some other ci files
  • Slopes: Email has higher intercept (higher access), and steeper slope (shorter effective life) Half-life web (11), email (36), files (88) Search before-after: Files: 1.0, 0.6, 40% Email: 2.6, 1.5, 42% Web: 5.0, 4.0, 20%
  • Phlat Phast

    1. 1. Phlat Ph ast & ph lexible personal search and organization Ed Cutrell Microsoft Research Adaptive Systems & Interaction Group
    2. 2. Outline <ul><li>Personal search today </li></ul><ul><li>Search with Phlat </li></ul><ul><ul><li>With: Susan Dumais, Daniel Robbins, Raman Sarin </li></ul></ul><ul><ul><li>Search/Browse/Filter </li></ul></ul><ul><ul><li>Tagging </li></ul></ul><ul><ul><li>Evaluation & Lessons </li></ul></ul><ul><li>Memory Landmarks </li></ul><ul><li>Implicit Query </li></ul>
    3. 3. Search in 2004 … <ul><li>Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, notes) </li></ul><ul><li>Often slow </li></ul>“… the No.1 question we're trying to solve [in Longhorn] is ‘Where's my stuff?’ Right now, file space on any PC is a cesspool. “ Bill Gates, FORTUNE interview, June 23, 2002
    4. 4. Stuff I’ve Seen <ul><li>In 2001, we built SIS </li></ul><ul><ul><li>Deployed to ~3000 people in MSFT </li></ul></ul><ul><ul><li>Allowed us to study how people use (& want to use) personal search </li></ul></ul><ul><ul><li>Limited prototype but very successful </li></ul></ul>
    5. 5. Search Today <ul><li>Desktop search hits the bigtime! </li></ul><ul><ul><li>MSN Toolbar—Windows Desktop Search </li></ul></ul><ul><ul><li>Google DS </li></ul></ul><ul><ul><li>Yahoo! Toolbar (X1 DS) </li></ul></ul><ul><ul><li>Copernic DS </li></ul></ul><ul><ul><li>Apple OS X Tiger with Spotlight </li></ul></ul><ul><ul><li>Many others… </li></ul></ul><ul><li>Unified index of full-text & metadata for different stores </li></ul><ul><li>Re-use vs. initial discovery </li></ul>
    6. 6. Simple UI on Rich Client
    7. 7. Search with Phlat <ul><li>Phlat asks: Can we design an intuitive and flexible interface that replaces the traditional search/browse dichotomy? </li></ul><ul><ul><li>Topple the tyranny of the rigid hierarchical file system! (folders: keep the metadata and toss the metaphor) </li></ul></ul><ul><ul><li>Embrace the power of the index! (real-time interaction on any axis) </li></ul></ul><ul><li>Powered by Windows Desktop Search </li></ul><ul><ul><li>Phlat is a client shell that runs against the index built & maintained by Windows DS (MSN Toolbar) </li></ul></ul><ul><ul><li>Queries routed through SIS Communications layer (C#) </li></ul></ul>
    8. 8. Faceted property filtering <ul><li>5 canonical properties to filter on (extensible) </li></ul><ul><li>Prop filters integrated with query </li></ul><ul><li>Type-in for each property (wordwheel) </li></ul><ul><li>Search==Browse </li></ul>
    9. 9. Tagging <ul><li>Apply single set of user-generated metadata to all files, email, etc. </li></ul><ul><li>Allow but not require hierarchy </li></ul><ul><li>Tags are directly associated with files (NTFS or MAPI props) </li></ul><ul><li>Need to hook CFD </li></ul><ul><li>Delay associated with query & indexer </li></ul>
    10. 10. Demo
    11. 11. Phlat Demo Points <ul><li>Full-text & property search rendered similarly—no stuck filters, browse/search continuum </li></ul><ul><li>Sort and filter results—stable sort, static headers </li></ul><ul><li>Wordwheel for properties--hierarchy </li></ul><ul><li>Tagging (hierarchy & auto-open to level for tagged items) </li></ul><ul><li>?Right-click and other advanced functionality </li></ul><ul><li>?Saved queries, Standing queries </li></ul><ul><li>?MyTopics—Arun Surendran </li></ul>
    12. 12. Evaluating Phlat <ul><li>Internal deployment </li></ul><ul><ul><li>~500 downloads </li></ul></ul><ul><ul><li>Users include: program management, test, sales, development, administrative, executives, etc. </li></ul></ul><ul><li>Research techniques </li></ul><ul><ul><li>Free-form feedback </li></ul></ul><ul><ul><li>Questionnaires; Structured interviews </li></ul></ul><ul><ul><li>Usage patterns from log data </li></ul></ul><ul><ul><li>Gaze tracking studies </li></ul></ul><ul><ul><li>Lab studies for richer UI (e.g., timeline, trends) </li></ul></ul>
    13. 13. User Logging <ul><li>Phlat records a broad range of query characteristics to help us understand the characteristics of usage. </li></ul><ul><ul><li>Age of items opened </li></ul></ul><ul><ul><li>Characteristics of query and filters used </li></ul></ul><ul><ul><li>Number of tags applied </li></ul></ul><ul><ul><li>Cycle of query iteration </li></ul></ul><ul><ul><li>Special instrumented version for gaze tracking research </li></ul></ul>
    14. 14. Phlat Observations 1 (unified access) <ul><li>Metadata quality is variable </li></ul><ul><ul><li>Email: rich, pretty clean </li></ul></ul><ul><ul><li>Web: little, not very useful for retrieval </li></ul></ul><ul><ul><li>Files: some, but often wrong </li></ul></ul><ul><ul><li>Human annotation: don’t depend on it… ( but good UI in authoring environments can help) </li></ul></ul><ul><li>Memory depends on abstractions </li></ul><ul><ul><li>Date is dependent on the object! </li></ul></ul><ul><ul><ul><li>Appointment, when it happens </li></ul></ul></ul><ul><ul><ul><li>File, when it is changed </li></ul></ul></ul><ul><ul><ul><li>Email and Web, when it is seen </li></ul></ul></ul><ul><ul><li>People </li></ul></ul><ul><ul><ul><li>To, From, Cc, Author, Artist </li></ul></ul></ul><ul><ul><li>Document names </li></ul></ul>
    15. 15. Phlat Observations 2 (tricky business) <ul><li>Internet cache & web history are very useful but tricky areas </li></ul><ul><ul><li>Cache can be confusing, but re-access is important </li></ul></ul><ul><ul><li>Access to temp files is GREAT </li></ul></ul><ul><li>Tagging </li></ul><ul><ul><li>Hierarchy important for tag organization and cognitive assistance, but flexibility is key (think organic) </li></ul></ul><ul><ul><li>MUST make tagging UI ubiquitous and available at inflection points of information consumption </li></ul></ul><ul><ul><li>Sharing tricky </li></ul></ul>
    16. 16. Timeline w/ Landmarks <ul><li>Timeline interface </li></ul><ul><ul><li>(with Merrie Ringel & Eric Horvitz) </li></ul></ul><ul><li>Augmented with landmarks as pointers into human memory </li></ul><ul><ul><li>General: holidays, world events </li></ul></ul><ul><ul><li>Personal: important photos, appointments </li></ul></ul><ul><ul><li>Heuristics or Bayesian models to identify memorable events </li></ul></ul>
    17. 17. SIS, Timeline w/ Landmarks Search Results <ul><li>Memory Landmarks </li></ul><ul><li>General (world, calendar) </li></ul><ul><li>Personal (appts, photos) </li></ul><ul><li><linked by time to results> </li></ul>Distribution of Results Over Time
    18. 18. Demo
    19. 19. Timeline Experiment With Landmarks Without Landmarks Dates Only Landmarks + Dates 0 5 10 15 20 25 30 Search Time (s)
    20. 20. Contextualizing Search <ul><li>Search is not the end goal … </li></ul><ul><li>Finding: </li></ul><ul><li>Always available search from task bar </li></ul><ul><li>Search from within apps </li></ul><ul><ul><li>Plug-ins w/ queries to Phlat/Desktop Search </li></ul></ul><ul><ul><li>Research pane </li></ul></ul><ul><li>Implicit queries </li></ul><ul><ul><li>Proactively finding results </li></ul></ul><ul><li>Using: </li></ul><ul><li>e.g., drag/drop, tag, etc. </li></ul>
    21. 21. Implicit Queries Background search on top k interesting terms from message, based on user’s index — Score = tf doc / log(tf corpus +1) Quick searches for people associated with the message and Subject. Top N hits for IQ based on size of window. Open items directly. Go to SIS for immediate detailed search. Box autofills with IQ search terms.
    22. 22. IQ Demo
    23. 23. Learn More <ul><li>Phlat is usable NOW! http://phlat </li></ul><ul><li>Design goes well beyond the prototype. Feel free to view spec: http://team/sites/phlat </li></ul><ul><li>More info, papers, etc. http://research.microsoft.com/~cutrell </li></ul><ul><li>Related projects at MSFT </li></ul><ul><ul><li>Memex, Tesla, Longhorn </li></ul></ul>
    24. 24. Thank You!
    25. 26. SIS Usage Data <ul><li>Detailed analysis for 234 people, 6 weeks usage </li></ul><ul><li>Personal store characteristics </li></ul><ul><ul><li>5k – 100k items </li></ul></ul><ul><li>Query characteristics </li></ul><ul><ul><li>Short queries (1.6 words) </li></ul></ul><ul><ul><li>Few advanced operators or fielded search in query box (~7%) </li></ul></ul><ul><ul><li>Frequent use of query iteration (48%) </li></ul></ul><ul><ul><ul><li>50% refined queries filtered – type, date (most common) </li></ul></ul></ul><ul><ul><ul><li>35% refined queries changed the query </li></ul></ul></ul><ul><ul><ul><li>13% refined queries re-sorted </li></ul></ul></ul><ul><li>Query content </li></ul><ul><ul><li>Vs. Spink et al.’s analysis of web queries </li></ul></ul><ul><ul><li>Importance of people </li></ul></ul><ul><ul><ul><li>29% of the queries involve people’s names </li></ul></ul></ul>
    26. 27. SIS Usage Data, cont’d <ul><li>Characteristics of items opened </li></ul><ul><li>File types opened </li></ul><ul><ul><li>76% Email </li></ul></ul><ul><ul><li>14% Web pages </li></ul></ul><ul><ul><li>10% Files </li></ul></ul><ul><li>Age of items opened </li></ul><ul><ul><li>7% today </li></ul></ul><ul><ul><li>22% within the last week </li></ul></ul><ul><ul><li>46% within the last month </li></ul></ul><ul><li>Ease of finding information </li></ul><ul><ul><li>Easier after SIS for web, email, files </li></ul></ul><ul><ul><li>Non-SIS search decreases for web, email, files </li></ul></ul>Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02

    ×