Remembrance of data past

Remembrance of
Data Past
Using Context in Personal
Information Search
Amélie Marian, Rutgers University
Thu D. Nguyen, Rutgers University
Daniela Vianna, Rutgers University
Luan Nguyen, Rutgers University

What was the name of that
restaurant?
• I went there with Julia
• We had dinner
• It was pouring rain

Some Sources of helpful data
“With Julia”: Calendar, email, text
“Restaurant”: Check-ins, cell phone GPS logs
“Restaurant”: Credit Card statements
“Pouring rain”: Historical Weather reports
Amélie Marian - Rutgers University

The Web

hypertext universal library of text

and multimedia

personal/private data social data


Personal data is fragmented


We remember our data based
on context clues
• “Serge sent me this file while we were on a
conference call with Alkis”
Skype, Google hangout, email, calendar, filesystem
• “I found this shopping web site while talking to
Tova on Skype, She was wearing a bue dress.”
Skype (+ snaphot), calendar, browser history
• “Are my insurance reimbursements up to date?”
Calendar, insurance account, bank account


We also remember data from
our social network
• “Mohan posted this interesting article on CS
education on Facebook, or maybe on Twitter, or
maybe it was Moshe Vardi who posted it”
Facebook, Twitter, browser history
• “What are the books my friends recommended”
Facebook (and comments), Twitter, emails
• “What are the place in Maui that my friends
enjoyed”
Facebook, Twitter, emails, Foursquare


Data dimensions
• Follow natural interrogative words:
• what? (content)
• who? (with whom, from whom, to whom,...)
• where? (physical or logical, in the real-world
and in the system)
• when? (time and date, but also what was
happening concurrently, before and after)
• why? (sequence of data/events that are
connected)
• how? (application, author, environment).

What is an answer?
• Content
• Email
• File
• Link
• List of objects (insurance reimbursements)
• But also part of the context
• Location
• Meeting participants
• Time


Personal Data Context
• Explicit
• Metadata information stored by the file system or
application, e.g., timestamp, GPS location, tags, directory
structure.
• Implicit
• Identified through application-based semantic
information, e.g., email recipients, calendar meeting
participants, check-in location
• Inferred
• Knowledge about the environment of the data collection.
• System environment (Which applications/documents were
opened concurrently with a given document)
• Social environment (Which Facebook members had access to
an event)
• Real world environment (Who was physically in the room –
RFID tags, skype –, weather).

Challenges
• Indexing content and context
• Semantic analysis for extracted context
• Data integration
• Identify inferred context
• Store and index as it is produced (system environment)
• Use API calls on-demand or copy information (social and
real-world environment)
• Unified data model
• Content and structure
• Data in context
• Navigation


Challenges (2)
• Powerful data tools
• Access and query (possibly remote) sources
• Search based on content and contextual clues
• Approximate matching
• Explore data to get relevant information
• Discover new relevant information
• “It’s been six month, you need to make a dentist
appointment!”
• “You forgot to pay the home insurance bill!”
• “Last time you bought toothpaste was a month ago,
you are probably running out.”


Previous results: EDBT’08
ICDE’08 (demo)
EDBT’11
Unified Structure, Content, TKDE’12
with Wei Wang,
and Metadata Search Chris Peery, and
Thu D. Nguyen

• Data and query models that unify content and
structure along one dimension
• System metadata seen as a separate dimension
• A unified multi-dimensional scoring mechanism
• IDF-based scores for each dimension
• Individual dimension scores easily combined
• TF scores to break ties
• Query processing algorithms and index structures
to score and rank answers efficiently


Unified Structure and Content
Target file: Halloween party pictures taken at home where someone
wears a witch costume

//Home*.//“Halloween” and .//“witch”+
File
root
Boundary
Home

“Halloween” “witch”


Unified IDF Score
For a unified data tree T, a path query PQ, and a file
F, we define:
• IDF Score
N
log
matches (T , PQ )
score idf
( PQ )
log N

where N is total number of files, and matches (T , PQ ) is the
set of files that match PQ in T.


Date: 26 Feb 07
File Extension: .txt
Case Study Directory:
Personal/Ebook/Novel/JackLondon

Target file: Electronic version of the novel SeaWolf by Jack London
Content and filtering Query Target file does
Keywords: sea, wolf, jack, london not appear
Directory: /JackLondon/Ebooks in result
Approximate Query Target file at
Keywords: sea, wolf, jack, london Rank 3
Directory: /JackLondon/Ebooks

Content and filtering Query
Keywords: sea, wolf, jack, london Target file does
Date:19 Feb 07; type: pdf not appear
Directory: /JackLondon/Ebooks in result
Approximate Query
Keywords: sea, wolf, jack, london Target file at
Date: 19 Feb 07; type: pdf Rank 2
Directory: /JackLondon/Ebooks

Conclusions
• First step towards an automated Personal Data
Assistant
• Looks at data and its context
• Gathers personal data from remote sources
• Cloud applications, social networks, emails, phone
logs, financial accounts, friends public data,…
• Integrates data in a unified data model
• Based on natural questions
• Provide search and discovery capabilities
• Beyond keyword search
• Context-aware

Amélie Marian - Rutgers Universityby
Funded a Google Research Award

Ushi Wakamaru!
(that’s the restaurant)


Remembrance of data past

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Remembrance of data past

Similar to Remembrance of data past (20)

Remembrance of data past