The document discusses the challenges of personal information management in the digital age. It describes how personal data is fragmented across many different devices and systems. Effective personal information management systems are needed to integrate this diverse personal data and support tasks like search, recollecting memories, and knowledge discovery. The author proposes a context-aware personal information management system called the Digital Self Project, which uses a w5h data model to index personal data by context dimensions like what, who, where, when, why and how. Preliminary results show the w5h search approach improves search accuracy over traditional text search.
2. Personal data is everywhere
Amélie Marian - Rutgers University - SinFra
2015
2
3. Personal data is exploding
• More and more devices/systems are capturing
all parts of our lives:
– Actively
emails, social media, calendar, contacts…
– Passively
GPS, records of financial transactions, records of
purchases
– Stealthily
Clicks, searches, interactions, tv viewing habits
Amélie Marian - Rutgers University - SinFra
2015
3
4. The time for Personal Information
Management Systems is now!
“A memex is a device in which an individual stores all
his books, records, and communications, and which is
mechanized so that it may be consulted with exceeding
speed and flexibility. It is an enlarged intimate
supplement to his memory.”
- Vannevar Bush, The Atlantic Monthly, 1945
Amélie Marian - Rutgers University - SinFra
2015
4
5. Personal Information Management
Challenges
• Data fragmentation
– Storage, archiving
– Data integration
– Data maintenance
– Synchronization
– Data quality
• Data ownership
– Access control
– Privacy
– Sharing
• Functionalities
– Search
– Knowledge discovery and data mining
– Internet of things
Amélie Marian - Rutgers University - SinFra
2015
5
6. Saving Personal Data – Old School
Amélie Marian - Rutgers University - SinFra
2015
6
7. Searching Personal Data – Old School…
Amélie Marian - Rutgers University - SinFra
2015
7File cabinet
around 1888
8. Personal Information Management –
the Digital Age
Amélie Marian - Rutgers University - SinFra
2015
8
% grep PIMS /usr/amelie/presentations
9. First-generation PIMS – Desktop based
• Storage
– Archival, safe-keeping
• Organization
– Structure
– Different file types
• Finding and re-finding information
– Different from traditional IR/Web search systems
– Keyword searches not ideal
Amélie Marian - Rutgers University - SinFra
2015
9
10. Desktop Search Tools
• Google Desktop Search (defunct)
• Apple Spotlight
• Windows Search
• Lead to frustration when users cannot find
information they know they have
Amélie Marian - Rutgers University - SinFra
2015
10
Use IR-style keyword searches
Some metadata filtering
11. Some Past PIMS projects
• Lifestreams
– Time oriented streams
• Stuff I’ve seen
– History of web behavior
• Haystack
– Uniform data model
• Connections, Seetrieve
– Task-based organization
• Dataspaces
– Semantic connections. Data
integration
• deskWeb
– Looks at the social network
graph
Various use of
– Context
– Time
– Social network
Limitations
– Limited data integration
– Local storage
– Basic functionalities
Amélie Marian - Rutgers University - SinFra
2015
11
12. A changing landscape
Cloud-based model
Heterogeneous data types and formats
Need for richer functionalities
Amélie Marian - Rutgers University - SinFra
2015
12
14. Life-logging
From Memex to MyLifeBits
Memex: Memory index or Memory extender
– Hypertext system by Vannevar Bush in 1945
– Compress and store all of their books,
records, and communications…
– Provide an "enlarged intimate supplement to
one's memory”
MyLifeBits
– Microsoft Research project with Gordon Bell
– All documents read or produced by Bell, CDs,
emails, web pages browsed, phone and
instant messaging conversations, etc.
Amélie Marian - Rutgers University - SinFra
2015
14
15. Hypermnesia
Exceptionally exact or vivid memory,
especially as associated with
certain mental illnesses
For a user: We cannot live knowing
that any word, any move will leave
a trace?
For the ecosystem: We cannot store all
the data we produce – lack of
storage resources
Amélie Marian - Rutgers University - SinFra
2015
15
Forgetting is Key to a Healthy Mind
Scientific American
Image: Aaron Goodman
A main issue is to select the information we
choose to keep
16. Memory Tasks
• The “five Rs” memory tasks
-Sellen and Whitaker, CACM 2010
Recollecting
Reminiscing
Retrieving
Reflecting
Remembering intentions
Amélie Marian - Rutgers University - SinFra
2015
16
17. Recollecting
• Task-based memory process
• Retracing steps to recollect information
– “Where did I leave my keys”
– “When was the last time I saw Pierre”
• Follow a series of cues to identify information
Amélie Marian - Rutgers University - SinFra
2015
17
Need: Connections between memory objects
(integration and navigation)
18. Reminiscing
• Browsing through past
memories to re-live
them
• Experience-based (no
specific goal in mind)
– E.g., looking at old
photos
Amélie Marian - Rutgers University - SinFra
2015
18
Need: Connections between memory objects
(integration and navigation)
19. Retrieving
• Retrieving specific information
– Files, documents, pictures
– Data snippets
• Use of metadata
• Can be combined with recollection
Amélie Marian - Rutgers University - SinFra
2015
19
Need: Query model, Indexes,
and Search algorithms
20. Reflecting
• Learning from the past
– Identify patterns
– Personal data analysis
• Towards a Personal Knowledge Base (PKB)
– Individual vs. shared knowledge
– Privacy concerns
Amélie Marian - Rutgers University - SinFra
2015
20
Need: Knowledge Discovery and Mining
techniques designed for personal data
21. Remembering Intentions
• Focus on prospective memory
– To-do lists
– Appointment reminders
• Active focus of commercial companies
– Google Now
– Notification apps (time- or location-based)
– Microsoft Personal Agent project?
Amélie Marian - Rutgers University - SinFra
2015
21
Need: NLP techniques designed for
personal data
22. One more wish: Serendipity
• Hearing by chance a song that is
going to totally obsess you
• A suggested book that will
change your life
• Entering this small restaurant
that you will remember forever
This is serendipitous
• A perfect search engine
• A perfect recommendation
system
• A perfect computer assistant
Efficient but not exciting
They lack serendipity
Amélie Marian - Rutgers University - SinFra
2015
22
Design programs that would help introduce
serendipity in our lives – Focus on the
experience
23. Digital Self Project at Rutgers
University
• Personal data is rich in contextual
information
– We remember our data based on
contextual cues
• Individualized context-aware
personal information
management tool
– Integrate users’ fragmented data
– Support personal information search
– Build a personal knowledge base
Faculty
– Amélie Marian
– Thu Nguyen
– Alex Borgida
Students
(past and present)
– Daniela Vianna
– Valia Kalokiri
– Alicia-Michelle Yong
– Chaolun Xia
Amélie Marian - Rutgers University - SinFra
2015
23
24. Digital Self Architecture
Amélie Marian - Rutgers University - SinFra
2015
24
Architecture
• Data Collection
– Identification, retrieval, storage
– Personal Extraction Tool:
https://github.com/ameliemarian/DigitalSelf
• Data Integration
– Multidimensional, context-
aware, unified data model
– w5h Model
• Search
– based on the natural memory
retrieval process
– Context-aware, approximate
– -w5h Search
• Knowledge Discovery
– Find connections and patterns
– Integrates user behavior and
feedback
25. w5h - Context-aware Data Model
• Personal information is rich in contextual
information
– Metadata
– Application data
– Environment knowledge
• Cognitive Psychology
– contextual cues are strong triggers for
autobiographical memories
• Personal information can be modeled and
indexed following six dimensions –
– what, who, where, when, why and how - w5h Model
Amélie Marian - Rutgers University - SinFra
2015
25
26. Preliminary Results - MRR
Bold: statistically
significant (p<0.05)
Amélie Marian - Rutgers University - SinFra
2015
26
w5h Text Solr
MRR MRR % improvements MRR % improvements
Group 1 0.40 0.28 30% 0.25 37%
Group 2 0.35 0.24 31% 0.16 54%
Group 3 0.37 0.23 38% 0.18 51%
Group 4 0.29 0.10 65% 0.21 28%
Group 5 0.27 0.06 78% 0.27 0%
Group 6 0.26 0.04 85% 0.28 -8%
MRR for each group and approach and the percentage improvements of the w5h
approach when compared with Text and Solr. Bold cases show that the results are
statistically significant (Wilcoxon one-tailed test, p < 0.05).
w5h: context aware search, w5h indexes
Text: Mongodb text index over integrated data
Solr: Text index on raw data
27. Conclusions
• The time for better Personal Information
Management is now!
– Many exciting research challenges
– Important ethical and societal implications
• The ability to search and recover past
information is a critical feature of future PIMS
– Need to take the specificities of Personal
Information search into account
Amélie Marian - Rutgers University - SinFra
2015
27
28. References
PIMS:
As we may think, Vannevar Bush, the Atlantic Monthly, 2005.
Personal Information Management. W. Jones and J. Teevan, editors.
University of Washington Press, 2007.
Beyond total capture: a constructive critique of Lifelogging, Sellen and Whitaker, CACM 2010.
A tool for personal data extraction. Vianna, Yong, Xia, Marian, and Nguyen, IIWeb 2014.
Microsoft’s Stuff I’ve Seen project (Dumais et al. SIGIR 2003)
MyLifeBits (Gemmel, Bell and Lueder, CACM 2006)
deskWeb (Zerr et al. SIGIR 2010)
Connections (Soules and Ganger, SOSP 2005)
Seetrieve (Gyllstrom and Soules, IUI 2008)
LifeStreams (Fertig, Freeman, and Gelernter, CHI 1996)
Haystack (Karger et al. CIDR 2005)
Data Integration:
A survey of approaches to automatic schema matching, Rahm & Bernstein 2001.
Principles of Data integration, Doan, Halevy, Ives, 2012.
Principles of dataspace systems, Halevy, Franklin, and Maier. CACM, 2006.
Amélie Marian - Rutgers University - SinFra
2015
28