Semantically Reconnecting Fragmented Information through User Activity Monitoring<br />Hinnerk Brügmann,   2011/02/17<br />
Motivation<br />Rapidly rising amount of  unstructured information in personal and enterprise environments<br />High effor...
Information is often organized in an Application specific Way<br />A ………………………………<br />A ………………………………<br />Information ite...
E-mails in a separate mailbox hierarchy
Bookmarks to favorite web sites in a browser hierarchy</li></li></ul><li>Approach: Relate Documents semantically to each o...
(Meta-)Information as a Source for semantic Relations<br />maintenance operations<br />concurrently open documents<br />pr...
Existing Research Approaches<br />Activity-based Relation Building<br />maintenance operations<br />concurrently open docu...
Focus of the presented Approach<br />Activity-based Relation Building<br />maintenance operations<br />concurrently open d...
Components of a User Task<br />Task<br />Motive<br />Activity<br />Goal<br />Action<br />Condition<br />Operation<br />Sou...
Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />during<br />aft...
Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />during<br />(UO...
Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />(WE0) opened do...
Multitasking<br />Begin primary task<br />Alert for secondary task<br /> Begin secondary task<br /> End secondary task<br ...
Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br ...
Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br ...
Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br ...
Clustering Window Instances<br />Clustering within the same Time Span:<br /><ul><li>Same application
Parent window id
Textual similarity of title
Textual similarity of content</li></li></ul><li>Clustering Window Instances<br />Clustering within the same Time Span:<br ...
Parent window id
Textual similarity of title
Textual similarity of content
Parallel visibility</li></li></ul><li>Clustering Window Instances<br />Clustering within the same Time Span:<br />Removal ...
Upcoming SlideShare
Loading in...5
×

Semantically Reconnecting Fragmented Information through User Activity Monitoring (Wi2011)

3,500

Published on

Today information items on user’s workstations are usually stored in separate collections depending on their format.
This results in a disconnect between information systems and user needs leading to high lookup times during task related information retrieval. This paper presents an approach to reduce document based information fragmentation by semantically
reconnecting electronic documents to each other
without imposing additional training or tagging workload on the user.

To this end the actions knowledge workers perform
on their desktop are transparently monitored to analyze the user’s interaction with his computer system. These action metadata are further clustered by superordinate activities performed by the user. Finally documents attached to window instances within the identified activity clusters
are semantically related to each other reducing the fragmentation of their contained information.

This allows a subsequent associative information discovery navigating from one document instance to other related document instances. A prototypical implementation and evaluation in a small scale
testing setup indicates the validity of the approach.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,500
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantically Reconnecting Fragmented Information through User Activity Monitoring (Wi2011)

  1. 1. Semantically Reconnecting Fragmented Information through User Activity Monitoring<br />Hinnerk Brügmann, 2011/02/17<br />
  2. 2. Motivation<br />Rapidly rising amount of unstructured information in personal and enterprise environments<br />High effort to locate required Information<br />1<br />2<br />Potential redundancy<br />3<br />Orphaned documents<br />4<br />Outdated document versions<br />
  3. 3. Information is often organized in an Application specific Way<br />A ………………………………<br />A ………………………………<br />Information items are stored in separate collections depending on their formats: <br /><ul><li>Documents in a file system folder hierarchy (e.g. My Documents folder)
  4. 4. E-mails in a separate mailbox hierarchy
  5. 5. Bookmarks to favorite web sites in a browser hierarchy</li></li></ul><li>Approach: Relate Documents semantically to each other as well as to Business Entities<br />charts.xls<br />Linda Ray<br />Charles Copper<br />Howard Lyman<br />works on the same project as<br />knows<br />Is an author of<br />is an author of<br />knows<br />documents<br />has to review<br />DaniSawberg<br />documentation.doc<br />Product A<br />works on the same project as<br />Julia Stern<br />
  6. 6. (Meta-)Information as a Source for semantic Relations<br />maintenance operations<br />concurrently open documents<br />provenance <br />contextual<br />personal access and usage<br />collaborative access and usage<br />compliance status<br />user classification<br />static<br />administrative access rights<br />static file attributes<br />inherent<br />content<br />Personal Domain<br />Enterprise Domain<br />
  7. 7. Existing Research Approaches<br />Activity-based Relation Building<br />maintenance operations<br />concurrently open documents<br />provenance <br />contextual<br />personal access and usage<br />collaborative access and usage<br />compliance status<br />user classification<br />static<br />administrative access rights<br />static file attributes<br />Content-based Relation Building<br />inherent<br />content<br />Personal Domain<br />Enterprise Domain<br />
  8. 8. Focus of the presented Approach<br />Activity-based Relation Building<br />maintenance operations<br />concurrently open documents<br />provenance <br />contextual<br />personal access and usage<br />collaborative access and usage<br />compliance status<br />user classification<br />static<br />administrative access rights<br />static file attributes<br />Content-based Relation Building<br />inherent<br />content<br />Personal Domain<br />Enterprise Domain<br />
  9. 9. Components of a User Task<br />Task<br />Motive<br />Activity<br />Goal<br />Action<br />Condition<br />Operation<br />Source: Kuutti (1996).<br />
  10. 10. Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />during<br />after<br />
  11. 11. Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />during<br />(UO1) copying text from price-list.doc into new document sales-offer.doc<br />after<br />
  12. 12. Action Context<br />Action Scope<br />Time<br />User Operation<br />Workplace Environment<br />before<br />(WE0) opened document price-list.doc<br />during<br />(WE1) open documents price-list.doc and sales-offer.doc <br />(UO1) copying text from price-list.doc into new document sales-offer.doc<br />after<br />(WE2) opened document price-list.doc<br />(UO2) saving document sales-offer.doc into folder customer-alpha on the local file system<br />
  13. 13. Multitasking<br />Begin primary task<br />Alert for secondary task<br /> Begin secondary task<br /> End secondary task<br />Resume primary task<br />Interruption lag<br />Resumption lag<br />Rehearse primary task problem<br />Clean up primary task<br />Do primary task<br />Do secondary task<br />Recall primary task problem<br />Do primary task<br />
  14. 14. Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br />Time Span 1 (visible windows 8:00 – 8:01)<br />Metadata window A<br />Metadata window B<br />. . .<br />Time Span 2 (visible windows 08:01 – 08:03)<br />Metadata window A<br />Metadata window C<br />. . .<br />. . .<br />
  15. 15. Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br />Time Span 1 (visible windows 8:00 – 8:01)<br />X/Y/Z Position<br />Height<br />Width<br />Window Handle<br />Parent Window Handle<br />Application ID<br />Focus indicator<br />Metadata window A<br />Metadata window B<br />. . .<br />Time Span 2 (visible windows 08:01 – 08:03)<br />Metadata window A<br />Metadata window C<br />. . .<br />. . .<br />
  16. 16. Snapshot Data groupedby Time Spans<br />Sensor reading (08:00 – 16:00)monitor configurations 1024 x 768 and 1280 x 800<br />Time Span 1 (visible windows 8:00 – 8:01)<br />X/Y/Z Position<br />Height<br />Width<br />Window Handle<br />Parent Window Handle<br />Application ID<br />Focus indicator<br />Metadata window A<br />Metadata window B<br />. . .<br />Time Span 2 (visible windows 08:01 – 08:03)<br />Metadata window A<br />Metadata window C<br />File Path<br />Document Title<br />Textual Content<br />. . .<br />. . .<br />
  17. 17. Clustering Window Instances<br />Clustering within the same Time Span:<br /><ul><li>Same application
  18. 18. Parent window id
  19. 19. Textual similarity of title
  20. 20. Textual similarity of content</li></li></ul><li>Clustering Window Instances<br />Clustering within the same Time Span:<br />Removal of application name from title<br />Porter stemming<br />Language detection using n-gram comparison<br />Stopword removal<br />Conversion to word list and application of a Weighted Tag Similarity algorithm<br /><ul><li>Same application
  21. 21. Parent window id
  22. 22. Textual similarity of title
  23. 23. Textual similarity of content
  24. 24. Parallel visibility</li></li></ul><li>Clustering Window Instances<br />Clustering within the same Time Span:<br />Removal of application name from title<br />Porter stemming<br />Language detection using n-gram comparison<br />Stopword removal<br />Conversion to word list and application of a Weighted Tag Similarity algorithm<br /><ul><li>Same application
  25. 25. Parent window id
  26. 26. Textual similarity of title
  27. 27. Textual similarity of content
  28. 28. Parallel visibility</li></ul>Clustering across multiple Time Spans:<br /><ul><li>Same window handle
  29. 29. Same application
  30. 30. Textual similarity of title
  31. 31. Textual similarity of content</li></li></ul><li>Clustering Window Instances<br />Clustering within the same Time Span:<br /><ul><li>Same application
  32. 32. Parent window id
  33. 33. Textual similarity of title
  34. 34. Textual similarity of content
  35. 35. Parallel visibility</li></ul>Similarity of window instances weighted by their respective visibility on screen<br />Adding semantic statements relating window instances<br />Reification statement expressing reliability of relation (scaled combined similarity)<br />Clustering across multiple Time Spans:<br /><ul><li>Same window handle
  36. 36. Same application
  37. 37. Textual similarity of title
  38. 38. Textual similarity of content</li></li></ul><li>Example of Window Instances related across Time Spans<br />
  39. 39. ResultingActivity Custer<br />
  40. 40. RelatingDocumentsusingtheidentifiedActivity Clusters<br />Documents connected to window instances are related to other documents in the same cluster:<br /><ul><li>Calculation of Weighted Degree Centrality C for each window instance w </li></ul>W being all window instances and rel (w,w′) being the attached reliability of the relation between windows w and w′<br /><ul><li>Adding a Document Interrelation for all documents within the same cluster with a reliability of:</li></ul> C(p) * C(q)<br />p and q being the window instances the documents are attached to<br />
  41. 41. Evaluation<br />Prototypically implemented sensor plugin was installed on the client desktops of 4 knowledge workers. <br />15 Working Items with durations ranging from 5 minutes to 5 hours<br />Users denying generated relations  false positives<br />Users stating relations not detected by the system false negatives<br />Average combined error rate of ~4%<br />Computed reliability significantly lower on erroneous relations<br />
  42. 42. Thank you<br />Semantically Reconnecting Fragmented Information through User Activity Monitoring<br />Hinnerk Brügmann<br />http://consense-project.com<br />

×