SlideShare a Scribd company logo
1 of 28
Personal Information Search and
Discovery
Amélie Marian
Rutgers University
amelie@cs.rutgers.edu
Personal data is everywhere
Amélie Marian - Rutgers University - SinFra
2015
2
Personal data is exploding
• More and more devices/systems are capturing
all parts of our lives:
– Actively
emails, social media, calendar, contacts…
– Passively
GPS, records of financial transactions, records of
purchases
– Stealthily
Clicks, searches, interactions, tv viewing habits
Amélie Marian - Rutgers University - SinFra
2015
3
The time for Personal Information
Management Systems is now!
“A memex is a device in which an individual stores all
his books, records, and communications, and which is
mechanized so that it may be consulted with exceeding
speed and flexibility. It is an enlarged intimate
supplement to his memory.”
- Vannevar Bush, The Atlantic Monthly, 1945
Amélie Marian - Rutgers University - SinFra
2015
4
Personal Information Management
Challenges
• Data fragmentation
– Storage, archiving
– Data integration
– Data maintenance
– Synchronization
– Data quality
• Data ownership
– Access control
– Privacy
– Sharing
• Functionalities
– Search
– Knowledge discovery and data mining
– Internet of things
Amélie Marian - Rutgers University - SinFra
2015
5
Saving Personal Data – Old School
Amélie Marian - Rutgers University - SinFra
2015
6
Searching Personal Data – Old School…
Amélie Marian - Rutgers University - SinFra
2015
7File cabinet
around 1888
Personal Information Management –
the Digital Age
Amélie Marian - Rutgers University - SinFra
2015
8
% grep PIMS /usr/amelie/presentations
First-generation PIMS – Desktop based
• Storage
– Archival, safe-keeping
• Organization
– Structure
– Different file types
• Finding and re-finding information
– Different from traditional IR/Web search systems
– Keyword searches not ideal
Amélie Marian - Rutgers University - SinFra
2015
9
Desktop Search Tools
• Google Desktop Search (defunct)
• Apple Spotlight
• Windows Search
• Lead to frustration when users cannot find
information they know they have
Amélie Marian - Rutgers University - SinFra
2015
10
Use IR-style keyword searches
Some metadata filtering
Some Past PIMS projects
• Lifestreams
– Time oriented streams
• Stuff I’ve seen
– History of web behavior
• Haystack
– Uniform data model
• Connections, Seetrieve
– Task-based organization
• Dataspaces
– Semantic connections. Data
integration
• deskWeb
– Looks at the social network
graph
Various use of
– Context
– Time
– Social network
Limitations
– Limited data integration
– Local storage
– Basic functionalities
Amélie Marian - Rutgers University - SinFra
2015
11
A changing landscape
Cloud-based model
Heterogeneous data types and formats
Need for richer functionalities
Amélie Marian - Rutgers University - SinFra
2015
12
The Future of Personal
Information Search
Life-logging
From Memex to MyLifeBits
Memex: Memory index or Memory extender
– Hypertext system by Vannevar Bush in 1945
– Compress and store all of their books,
records, and communications…
– Provide an "enlarged intimate supplement to
one's memory”
MyLifeBits
– Microsoft Research project with Gordon Bell
– All documents read or produced by Bell, CDs,
emails, web pages browsed, phone and
instant messaging conversations, etc.
Amélie Marian - Rutgers University - SinFra
2015
14
Hypermnesia
Exceptionally exact or vivid memory,
especially as associated with
certain mental illnesses
For a user: We cannot live knowing
that any word, any move will leave
a trace?
For the ecosystem: We cannot store all
the data we produce – lack of
storage resources
Amélie Marian - Rutgers University - SinFra
2015
15
Forgetting is Key to a Healthy Mind
Scientific American
Image: Aaron Goodman
A main issue is to select the information we
choose to keep
Memory Tasks
• The “five Rs” memory tasks
-Sellen and Whitaker, CACM 2010
Recollecting
Reminiscing
Retrieving
Reflecting
Remembering intentions
Amélie Marian - Rutgers University - SinFra
2015
16
Recollecting
• Task-based memory process
• Retracing steps to recollect information
– “Where did I leave my keys”
– “When was the last time I saw Pierre”
• Follow a series of cues to identify information
Amélie Marian - Rutgers University - SinFra
2015
17
Need: Connections between memory objects
(integration and navigation)
Reminiscing
• Browsing through past
memories to re-live
them
• Experience-based (no
specific goal in mind)
– E.g., looking at old
photos
Amélie Marian - Rutgers University - SinFra
2015
18
Need: Connections between memory objects
(integration and navigation)
Retrieving
• Retrieving specific information
– Files, documents, pictures
– Data snippets
• Use of metadata
• Can be combined with recollection
Amélie Marian - Rutgers University - SinFra
2015
19
Need: Query model, Indexes,
and Search algorithms
Reflecting
• Learning from the past
– Identify patterns
– Personal data analysis
• Towards a Personal Knowledge Base (PKB)
– Individual vs. shared knowledge
– Privacy concerns
Amélie Marian - Rutgers University - SinFra
2015
20
Need: Knowledge Discovery and Mining
techniques designed for personal data
Remembering Intentions
• Focus on prospective memory
– To-do lists
– Appointment reminders
• Active focus of commercial companies
– Google Now
– Notification apps (time- or location-based)
– Microsoft Personal Agent project?
Amélie Marian - Rutgers University - SinFra
2015
21
Need: NLP techniques designed for
personal data
One more wish: Serendipity
• Hearing by chance a song that is
going to totally obsess you
• A suggested book that will
change your life
• Entering this small restaurant
that you will remember forever
This is serendipitous
• A perfect search engine
• A perfect recommendation
system
• A perfect computer assistant
Efficient but not exciting
They lack serendipity
Amélie Marian - Rutgers University - SinFra
2015
22
Design programs that would help introduce
serendipity in our lives – Focus on the
experience
Digital Self Project at Rutgers
University
• Personal data is rich in contextual
information
– We remember our data based on
contextual cues
• Individualized context-aware
personal information
management tool
– Integrate users’ fragmented data
– Support personal information search
– Build a personal knowledge base
Faculty
– Amélie Marian
– Thu Nguyen
– Alex Borgida
Students
(past and present)
– Daniela Vianna
– Valia Kalokiri
– Alicia-Michelle Yong
– Chaolun Xia
Amélie Marian - Rutgers University - SinFra
2015
23
Digital Self Architecture
Amélie Marian - Rutgers University - SinFra
2015
24
Architecture
• Data Collection
– Identification, retrieval, storage
– Personal Extraction Tool:
https://github.com/ameliemarian/DigitalSelf
• Data Integration
– Multidimensional, context-
aware, unified data model
– w5h Model
• Search
– based on the natural memory
retrieval process
– Context-aware, approximate
– -w5h Search
• Knowledge Discovery
– Find connections and patterns
– Integrates user behavior and
feedback
w5h - Context-aware Data Model
• Personal information is rich in contextual
information
– Metadata
– Application data
– Environment knowledge
• Cognitive Psychology
– contextual cues are strong triggers for
autobiographical memories
• Personal information can be modeled and
indexed following six dimensions –
– what, who, where, when, why and how - w5h Model
Amélie Marian - Rutgers University - SinFra
2015
25
Preliminary Results - MRR
Bold: statistically
significant (p<0.05)
Amélie Marian - Rutgers University - SinFra
2015
26
w5h Text Solr
MRR MRR % improvements MRR % improvements
Group 1 0.40 0.28 30% 0.25 37%
Group 2 0.35 0.24 31% 0.16 54%
Group 3 0.37 0.23 38% 0.18 51%
Group 4 0.29 0.10 65% 0.21 28%
Group 5 0.27 0.06 78% 0.27 0%
Group 6 0.26 0.04 85% 0.28 -8%
MRR for each group and approach and the percentage improvements of the w5h
approach when compared with Text and Solr. Bold cases show that the results are
statistically significant (Wilcoxon one-tailed test, p < 0.05).
w5h: context aware search, w5h indexes
Text: Mongodb text index over integrated data
Solr: Text index on raw data
Conclusions
• The time for better Personal Information
Management is now!
– Many exciting research challenges
– Important ethical and societal implications
• The ability to search and recover past
information is a critical feature of future PIMS
– Need to take the specificities of Personal
Information search into account
Amélie Marian - Rutgers University - SinFra
2015
27
References
PIMS:
As we may think, Vannevar Bush, the Atlantic Monthly, 2005.
Personal Information Management. W. Jones and J. Teevan, editors.
University of Washington Press, 2007.
Beyond total capture: a constructive critique of Lifelogging, Sellen and Whitaker, CACM 2010.
A tool for personal data extraction. Vianna, Yong, Xia, Marian, and Nguyen, IIWeb 2014.
Microsoft’s Stuff I’ve Seen project (Dumais et al. SIGIR 2003)
MyLifeBits (Gemmel, Bell and Lueder, CACM 2006)
deskWeb (Zerr et al. SIGIR 2010)
Connections (Soules and Ganger, SOSP 2005)
Seetrieve (Gyllstrom and Soules, IUI 2008)
LifeStreams (Fertig, Freeman, and Gelernter, CHI 1996)
Haystack (Karger et al. CIDR 2005)
Data Integration:
A survey of approaches to automatic schema matching, Rahm & Bernstein 2001.
Principles of Data integration, Doan, Halevy, Ives, 2012.
Principles of dataspace systems, Halevy, Franklin, and Maier. CACM, 2006.
Amélie Marian - Rutgers University - SinFra
2015
28

More Related Content

Viewers also liked

From User Needs to Opportunities in Personal Information Management: A Case S...
From User Needs to Opportunities in Personal Information Management: A Case S...From User Needs to Opportunities in Personal Information Management: A Case S...
From User Needs to Opportunities in Personal Information Management: A Case S...Beat Signer
 
Strategies and Actions on Waste for Sustainable Cities
Strategies and Actions on Waste for Sustainable CitiesStrategies and Actions on Waste for Sustainable Cities
Strategies and Actions on Waste for Sustainable CitiesD-Waste
 
default_change.pdf
default_change.pdfdefault_change.pdf
default_change.pdfsptlove
 
How HPCC Systems is Transforming Training
How HPCC Systems is Transforming TrainingHow HPCC Systems is Transforming Training
How HPCC Systems is Transforming TrainingHPCC Systems
 
Bismarck zinkt 27051941
Bismarck zinkt 27051941Bismarck zinkt 27051941
Bismarck zinkt 27051941Jasper Pasgang
 
LEVICK Weekly - Dec 7 2012
LEVICK Weekly - Dec 7 2012LEVICK Weekly - Dec 7 2012
LEVICK Weekly - Dec 7 2012LEVICK
 
The Web as a Tool
The Web as a ToolThe Web as a Tool
The Web as a Tooljschleuss
 
Summer solstice arcidosso 2012 (poster)
Summer solstice arcidosso 2012 (poster)Summer solstice arcidosso 2012 (poster)
Summer solstice arcidosso 2012 (poster)Ale Cignetti
 
Acoustic Correction of High School gym in Florence
Acoustic Correction of High School gym in FlorenceAcoustic Correction of High School gym in Florence
Acoustic Correction of High School gym in Florencearchisonus
 
Bp monline boostsalesperfomance
Bp monline boostsalesperfomanceBp monline boostsalesperfomance
Bp monline boostsalesperfomanceAndrey Dovgan
 
Introduction to Inbound Marketing
Introduction to Inbound MarketingIntroduction to Inbound Marketing
Introduction to Inbound MarketingBrightIdeas.co
 
Q3 2012 Home Improvement Search Trends
Q3 2012 Home Improvement Search TrendsQ3 2012 Home Improvement Search Trends
Q3 2012 Home Improvement Search Trendstonymaull92
 
Global observation session
Global observation sessionGlobal observation session
Global observation sessionIC3Climate
 
випроминювання
випроминюваннявипроминювання
випроминюванняUlanenko
 
Great Festive Offer on all the Computer Courses
Great Festive Offer on all the Computer CoursesGreat Festive Offer on all the Computer Courses
Great Festive Offer on all the Computer CoursesCMS Computer
 
Grab the Latest Offer
Grab the Latest Offer Grab the Latest Offer
Grab the Latest Offer CMS Computer
 
The engagement formula
The engagement formula The engagement formula
The engagement formula Vikas Sinhmar
 

Viewers also liked (20)

From User Needs to Opportunities in Personal Information Management: A Case S...
From User Needs to Opportunities in Personal Information Management: A Case S...From User Needs to Opportunities in Personal Information Management: A Case S...
From User Needs to Opportunities in Personal Information Management: A Case S...
 
Strategies and Actions on Waste for Sustainable Cities
Strategies and Actions on Waste for Sustainable CitiesStrategies and Actions on Waste for Sustainable Cities
Strategies and Actions on Waste for Sustainable Cities
 
default_change.pdf
default_change.pdfdefault_change.pdf
default_change.pdf
 
Il condizionale
Il condizionaleIl condizionale
Il condizionale
 
How HPCC Systems is Transforming Training
How HPCC Systems is Transforming TrainingHow HPCC Systems is Transforming Training
How HPCC Systems is Transforming Training
 
Bismarck zinkt 27051941
Bismarck zinkt 27051941Bismarck zinkt 27051941
Bismarck zinkt 27051941
 
LEVICK Weekly - Dec 7 2012
LEVICK Weekly - Dec 7 2012LEVICK Weekly - Dec 7 2012
LEVICK Weekly - Dec 7 2012
 
The Web as a Tool
The Web as a ToolThe Web as a Tool
The Web as a Tool
 
Summer solstice arcidosso 2012 (poster)
Summer solstice arcidosso 2012 (poster)Summer solstice arcidosso 2012 (poster)
Summer solstice arcidosso 2012 (poster)
 
Acoustic Correction of High School gym in Florence
Acoustic Correction of High School gym in FlorenceAcoustic Correction of High School gym in Florence
Acoustic Correction of High School gym in Florence
 
Bp monline boostsalesperfomance
Bp monline boostsalesperfomanceBp monline boostsalesperfomance
Bp monline boostsalesperfomance
 
Locking down word press
Locking down word pressLocking down word press
Locking down word press
 
Introduction to Inbound Marketing
Introduction to Inbound MarketingIntroduction to Inbound Marketing
Introduction to Inbound Marketing
 
Q3 2012 Home Improvement Search Trends
Q3 2012 Home Improvement Search TrendsQ3 2012 Home Improvement Search Trends
Q3 2012 Home Improvement Search Trends
 
Global observation session
Global observation sessionGlobal observation session
Global observation session
 
випроминювання
випроминюваннявипроминювання
випроминювання
 
Great Festive Offer on all the Computer Courses
Great Festive Offer on all the Computer CoursesGreat Festive Offer on all the Computer Courses
Great Festive Offer on all the Computer Courses
 
Grab the Latest Offer
Grab the Latest Offer Grab the Latest Offer
Grab the Latest Offer
 
Ro sita
Ro sitaRo sita
Ro sita
 
The engagement formula
The engagement formula The engagement formula
The engagement formula
 

Similar to Personal Information Search and Discovery

Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student LearningMarie Bienkowski
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social MachineUlrik Lyngs
 
Managing Research Data in the Life Sciences
Managing Research Data in the Life SciencesManaging Research Data in the Life Sciences
Managing Research Data in the Life Sciencesalwerhane
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
 
The Embedded Data Librarian
The Embedded Data LibrarianThe Embedded Data Librarian
The Embedded Data LibrarianLibrary_Connect
 
Graph and Semantic Search
Graph and Semantic Search Graph and Semantic Search
Graph and Semantic Search notess
 
香港六合彩
香港六合彩香港六合彩
香港六合彩wejia
 
Hana Marcetic: Exploring the methods and practises of personal digital inform...
Hana Marcetic: Exploring the methods and practises of personal digital inform...Hana Marcetic: Exploring the methods and practises of personal digital inform...
Hana Marcetic: Exploring the methods and practises of personal digital inform...KISK FF MU
 
Come to the library to learn how not to smile at a crocodile
Come to the library to learn how not to smile at a crocodileCome to the library to learn how not to smile at a crocodile
Come to the library to learn how not to smile at a crocodileRoxanne Missingham
 
Learning Analytics: New thinking supporting educational research
Learning Analytics: New thinking supporting educational researchLearning Analytics: New thinking supporting educational research
Learning Analytics: New thinking supporting educational researchAndrew Deacon
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of UtahRebekah Cummings
 
ICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryLynda Kellam
 
Working in a Global Environment - Success Strategies for Today's Information ...
Working in a Global Environment - Success Strategies for Today's Information ...Working in a Global Environment - Success Strategies for Today's Information ...
Working in a Global Environment - Success Strategies for Today's Information ...SJSU School of Information
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the libraryColleen DeLory
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the libraryLibrary_Connect
 
Visualising activity in learning networks using open data and educational ...
Visualising activity in learning networks   using open data and educational  ...Visualising activity in learning networks   using open data and educational  ...
Visualising activity in learning networks using open data and educational ...Michael Paskevicius
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Start with the Staff
Start with the StaffStart with the Staff
Start with the StaffALISS
 
Modern Knowledge Management
Modern Knowledge ManagementModern Knowledge Management
Modern Knowledge ManagementAgnes Molnar
 

Similar to Personal Information Search and Discovery (20)

Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Big Data for Student Learning
Big Data for Student LearningBig Data for Student Learning
Big Data for Student Learning
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social Machine
 
Managing Research Data in the Life Sciences
Managing Research Data in the Life SciencesManaging Research Data in the Life Sciences
Managing Research Data in the Life Sciences
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
The Embedded Data Librarian
The Embedded Data LibrarianThe Embedded Data Librarian
The Embedded Data Librarian
 
Graph and Semantic Search
Graph and Semantic Search Graph and Semantic Search
Graph and Semantic Search
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Hana Marcetic: Exploring the methods and practises of personal digital inform...
Hana Marcetic: Exploring the methods and practises of personal digital inform...Hana Marcetic: Exploring the methods and practises of personal digital inform...
Hana Marcetic: Exploring the methods and practises of personal digital inform...
 
Come to the library to learn how not to smile at a crocodile
Come to the library to learn how not to smile at a crocodileCome to the library to learn how not to smile at a crocodile
Come to the library to learn how not to smile at a crocodile
 
Learning Analytics: New thinking supporting educational research
Learning Analytics: New thinking supporting educational researchLearning Analytics: New thinking supporting educational research
Learning Analytics: New thinking supporting educational research
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
ICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryICPSR presentation for Research Advisory
ICPSR presentation for Research Advisory
 
Working in a Global Environment - Success Strategies for Today's Information ...
Working in a Global Environment - Success Strategies for Today's Information ...Working in a Global Environment - Success Strategies for Today's Information ...
Working in a Global Environment - Success Strategies for Today's Information ...
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Visualising activity in learning networks using open data and educational ...
Visualising activity in learning networks   using open data and educational  ...Visualising activity in learning networks   using open data and educational  ...
Visualising activity in learning networks using open data and educational ...
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Start with the Staff
Start with the StaffStart with the Staff
Start with the Staff
 
Modern Knowledge Management
Modern Knowledge ManagementModern Knowledge Management
Modern Knowledge Management
 

More from Amélie Marian

Integration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesIntegration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesAmélie Marian
 
Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Amélie Marian
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksAmélie Marian
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsAmélie Marian
 
Remembrance of data past
Remembrance of data pastRemembrance of data past
Remembrance of data pastAmélie Marian
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and styleAmélie Marian
 

More from Amélie Marian (7)

Integration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesIntegration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital Traces
 
Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random Walks
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative Statements
 
Searching Web Forums
Searching Web ForumsSearching Web Forums
Searching Web Forums
 
Remembrance of data past
Remembrance of data pastRemembrance of data past
Remembrance of data past
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and style
 

Personal Information Search and Discovery

  • 1. Personal Information Search and Discovery Amélie Marian Rutgers University amelie@cs.rutgers.edu
  • 2. Personal data is everywhere Amélie Marian - Rutgers University - SinFra 2015 2
  • 3. Personal data is exploding • More and more devices/systems are capturing all parts of our lives: – Actively emails, social media, calendar, contacts… – Passively GPS, records of financial transactions, records of purchases – Stealthily Clicks, searches, interactions, tv viewing habits Amélie Marian - Rutgers University - SinFra 2015 3
  • 4. The time for Personal Information Management Systems is now! “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” - Vannevar Bush, The Atlantic Monthly, 1945 Amélie Marian - Rutgers University - SinFra 2015 4
  • 5. Personal Information Management Challenges • Data fragmentation – Storage, archiving – Data integration – Data maintenance – Synchronization – Data quality • Data ownership – Access control – Privacy – Sharing • Functionalities – Search – Knowledge discovery and data mining – Internet of things Amélie Marian - Rutgers University - SinFra 2015 5
  • 6. Saving Personal Data – Old School Amélie Marian - Rutgers University - SinFra 2015 6
  • 7. Searching Personal Data – Old School… Amélie Marian - Rutgers University - SinFra 2015 7File cabinet around 1888
  • 8. Personal Information Management – the Digital Age Amélie Marian - Rutgers University - SinFra 2015 8 % grep PIMS /usr/amelie/presentations
  • 9. First-generation PIMS – Desktop based • Storage – Archival, safe-keeping • Organization – Structure – Different file types • Finding and re-finding information – Different from traditional IR/Web search systems – Keyword searches not ideal Amélie Marian - Rutgers University - SinFra 2015 9
  • 10. Desktop Search Tools • Google Desktop Search (defunct) • Apple Spotlight • Windows Search • Lead to frustration when users cannot find information they know they have Amélie Marian - Rutgers University - SinFra 2015 10 Use IR-style keyword searches Some metadata filtering
  • 11. Some Past PIMS projects • Lifestreams – Time oriented streams • Stuff I’ve seen – History of web behavior • Haystack – Uniform data model • Connections, Seetrieve – Task-based organization • Dataspaces – Semantic connections. Data integration • deskWeb – Looks at the social network graph Various use of – Context – Time – Social network Limitations – Limited data integration – Local storage – Basic functionalities Amélie Marian - Rutgers University - SinFra 2015 11
  • 12. A changing landscape Cloud-based model Heterogeneous data types and formats Need for richer functionalities Amélie Marian - Rutgers University - SinFra 2015 12
  • 13. The Future of Personal Information Search
  • 14. Life-logging From Memex to MyLifeBits Memex: Memory index or Memory extender – Hypertext system by Vannevar Bush in 1945 – Compress and store all of their books, records, and communications… – Provide an "enlarged intimate supplement to one's memory” MyLifeBits – Microsoft Research project with Gordon Bell – All documents read or produced by Bell, CDs, emails, web pages browsed, phone and instant messaging conversations, etc. Amélie Marian - Rutgers University - SinFra 2015 14
  • 15. Hypermnesia Exceptionally exact or vivid memory, especially as associated with certain mental illnesses For a user: We cannot live knowing that any word, any move will leave a trace? For the ecosystem: We cannot store all the data we produce – lack of storage resources Amélie Marian - Rutgers University - SinFra 2015 15 Forgetting is Key to a Healthy Mind Scientific American Image: Aaron Goodman A main issue is to select the information we choose to keep
  • 16. Memory Tasks • The “five Rs” memory tasks -Sellen and Whitaker, CACM 2010 Recollecting Reminiscing Retrieving Reflecting Remembering intentions Amélie Marian - Rutgers University - SinFra 2015 16
  • 17. Recollecting • Task-based memory process • Retracing steps to recollect information – “Where did I leave my keys” – “When was the last time I saw Pierre” • Follow a series of cues to identify information Amélie Marian - Rutgers University - SinFra 2015 17 Need: Connections between memory objects (integration and navigation)
  • 18. Reminiscing • Browsing through past memories to re-live them • Experience-based (no specific goal in mind) – E.g., looking at old photos Amélie Marian - Rutgers University - SinFra 2015 18 Need: Connections between memory objects (integration and navigation)
  • 19. Retrieving • Retrieving specific information – Files, documents, pictures – Data snippets • Use of metadata • Can be combined with recollection Amélie Marian - Rutgers University - SinFra 2015 19 Need: Query model, Indexes, and Search algorithms
  • 20. Reflecting • Learning from the past – Identify patterns – Personal data analysis • Towards a Personal Knowledge Base (PKB) – Individual vs. shared knowledge – Privacy concerns Amélie Marian - Rutgers University - SinFra 2015 20 Need: Knowledge Discovery and Mining techniques designed for personal data
  • 21. Remembering Intentions • Focus on prospective memory – To-do lists – Appointment reminders • Active focus of commercial companies – Google Now – Notification apps (time- or location-based) – Microsoft Personal Agent project? Amélie Marian - Rutgers University - SinFra 2015 21 Need: NLP techniques designed for personal data
  • 22. One more wish: Serendipity • Hearing by chance a song that is going to totally obsess you • A suggested book that will change your life • Entering this small restaurant that you will remember forever This is serendipitous • A perfect search engine • A perfect recommendation system • A perfect computer assistant Efficient but not exciting They lack serendipity Amélie Marian - Rutgers University - SinFra 2015 22 Design programs that would help introduce serendipity in our lives – Focus on the experience
  • 23. Digital Self Project at Rutgers University • Personal data is rich in contextual information – We remember our data based on contextual cues • Individualized context-aware personal information management tool – Integrate users’ fragmented data – Support personal information search – Build a personal knowledge base Faculty – Amélie Marian – Thu Nguyen – Alex Borgida Students (past and present) – Daniela Vianna – Valia Kalokiri – Alicia-Michelle Yong – Chaolun Xia Amélie Marian - Rutgers University - SinFra 2015 23
  • 24. Digital Self Architecture Amélie Marian - Rutgers University - SinFra 2015 24 Architecture • Data Collection – Identification, retrieval, storage – Personal Extraction Tool: https://github.com/ameliemarian/DigitalSelf • Data Integration – Multidimensional, context- aware, unified data model – w5h Model • Search – based on the natural memory retrieval process – Context-aware, approximate – -w5h Search • Knowledge Discovery – Find connections and patterns – Integrates user behavior and feedback
  • 25. w5h - Context-aware Data Model • Personal information is rich in contextual information – Metadata – Application data – Environment knowledge • Cognitive Psychology – contextual cues are strong triggers for autobiographical memories • Personal information can be modeled and indexed following six dimensions – – what, who, where, when, why and how - w5h Model Amélie Marian - Rutgers University - SinFra 2015 25
  • 26. Preliminary Results - MRR Bold: statistically significant (p<0.05) Amélie Marian - Rutgers University - SinFra 2015 26 w5h Text Solr MRR MRR % improvements MRR % improvements Group 1 0.40 0.28 30% 0.25 37% Group 2 0.35 0.24 31% 0.16 54% Group 3 0.37 0.23 38% 0.18 51% Group 4 0.29 0.10 65% 0.21 28% Group 5 0.27 0.06 78% 0.27 0% Group 6 0.26 0.04 85% 0.28 -8% MRR for each group and approach and the percentage improvements of the w5h approach when compared with Text and Solr. Bold cases show that the results are statistically significant (Wilcoxon one-tailed test, p < 0.05). w5h: context aware search, w5h indexes Text: Mongodb text index over integrated data Solr: Text index on raw data
  • 27. Conclusions • The time for better Personal Information Management is now! – Many exciting research challenges – Important ethical and societal implications • The ability to search and recover past information is a critical feature of future PIMS – Need to take the specificities of Personal Information search into account Amélie Marian - Rutgers University - SinFra 2015 27
  • 28. References PIMS: As we may think, Vannevar Bush, the Atlantic Monthly, 2005. Personal Information Management. W. Jones and J. Teevan, editors. University of Washington Press, 2007. Beyond total capture: a constructive critique of Lifelogging, Sellen and Whitaker, CACM 2010. A tool for personal data extraction. Vianna, Yong, Xia, Marian, and Nguyen, IIWeb 2014. Microsoft’s Stuff I’ve Seen project (Dumais et al. SIGIR 2003) MyLifeBits (Gemmel, Bell and Lueder, CACM 2006) deskWeb (Zerr et al. SIGIR 2010) Connections (Soules and Ganger, SOSP 2005) Seetrieve (Gyllstrom and Soules, IUI 2008) LifeStreams (Fertig, Freeman, and Gelernter, CHI 1996) Haystack (Karger et al. CIDR 2005) Data Integration: A survey of approaches to automatic schema matching, Rahm & Bernstein 2001. Principles of Data integration, Doan, Halevy, Ives, 2012. Principles of dataspace systems, Halevy, Franklin, and Maier. CACM, 2006. Amélie Marian - Rutgers University - SinFra 2015 28