SlideShare a Scribd company logo
1 of 25
Integration and Exploration
of Connected
Personal Digital Traces
Valia Kalokyri, Alex Borgida, Amélie Marian, Daniela Vianna
Rutgers University
Personal data is fragmented, heterogeneous
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 2
DigitalSelf Project: Goals
1. Integrate personal from various heterogeneous sources
2. Design of a unified and intuitive model to link and
represent personal information
3. Group personal data with respect to conceptually
coherent episodes – Creation of a Personal Knowledge
Base
4. Search tools for digital memories
5. Design of interactive tools to provide users with narrative
views of their digital memories.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 3
PIM – Personal Information Management
• Traditional PIM Systems – focus on objects relationships
• Haystack
• Semex
• OntoPim
• …
• We focus on a narrative of events
• Exploration of connections between events – or Personal Data
Traces (PDTs)
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 4
Background
• Research in psychology:
Episodic memory – memory of autobiographical events
• It is the collection of past personal experiences that occurred at a particular time
and place. (times, places, associated emotions, and other contextual who, what,
when, where, why knowledge that can be explicitly stated/conjured)
• Natural way to remember past events is by pertinent contextual
information; answers to:
• What, When, Where, Who, What, Why, How (w5h)
• Derived from the "frame" structure of events which involve the
digital documents
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 5
Integrating Personal Data
• Create an infrastructure to retrieve and store personal data
• Gather content from several online services (via APIs, IMAP)
• Social data - Facebook,Twitter, LinkedIn
• Geolocation data - Foursquare
• Email - Gmail, or any other email
• Calendars - Google Calendar
• Personal files - local file system, Google Drive, Dropbox
• Web browsing histories - Chrome, Firefox
• Apply entity resolution – who, where dimension
IIWeb’14 paper, Github open source
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 6
Contributions
• High-level description of episodic scripts
• Group events (PDTs) to connect them into a memory episode
• Scripts: prototypical plans, “a predetermined, stereotyped sequence of
actions that defines a well-known situation”. (Schank and Abelson)
• Heuristic algorithm to find and combine PDTs into scripts
• Case study: Eating out script
• Script description
• Evaluation with user data
Goal: Organize & summarize PTDs into episodes
Allow users to explore, understand and learn from their actions
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 7
Grouping Data into Coherent Episodes
• Provide a narrative by making connections between PDTs
• Example - Going out to eat at a restaurant
• Script would provide description of possible “event flows” (arrange
where & when to go, make reservation, call a cab/uber, go to the
restaurant, order food, [...], pay, [...], return, [...])
• Emails concerning a dinner
• OpenTable reservation at a restaurant
• Foursquare checkin with photos
• Credit card payment
Narrative for going out to a
dinner
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 8
EstablishWhereEat
InitiateGoingOut EstablishWhenEat
EstablishWhoEat
MakeReservation
<restaurant>
AttendEatingOut
Ontology for scripts
UML activity diagram for Eating_Out
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 9
Algorithm for instantiating script instances
1. Create a list of “trigger words/phrases”, whose occurrence
indicates that a document has something to do with an
instance of a particular script type.
• Start with goal events/subscripts - AttendEatingOut
• E.g. “Eat”, “eat out” and all their synonyms and hyponyms (Wordnet,
ConceptNet5)
• Consider the w5h participants of the goal event (Verbnet, Framenet)
• E.g. “restaurant” is a where value of “eat” for Eating_Out
• The result is a list of words to search for
• E.g. breakfast, lunch, dinner, restaurant and its hyponyms etc.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 10
Algorithm for instantiating script instances
2. All retrieved PDTs are preprocessed:
• Entity extraction (Stanford nltk)
• Who, Where
• Time extraction-explicating/disambiguating information
• E.g. tomorrow, this Wednesday, are made absolute dates
• Technlogies used: Stanford ntlk, python Dateparser, our own regular expressions
• Group certain kinds of documents into single individuals
• E.g. Email threads, facebook messages etc
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 11
Algorithm for instantiating script instances
3. Each individual leads to the creation of a candidate instance
of the script (or one of the subscript)
4. Fill some of the script instance sub-properties
• E.g. restaurant charge in a credit card bill provides evidence for the
attendEatingOut subscript, with whereEatingOccurred and
whenEatingOccurred and one whoAttended.
• A corresponding Facebookcheckin could give information about the
rest whoAttended property
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 12
Algorithm for instantiating script instances
5. Score the instances depending on the strength of evidence it
manifests for an instance.
• strong evidence:
• Bank statement
• a long email thread mentioning keywords many times and the user participating a
lot in the email exchange
• weak evidence:
• A single email mentioning the word “lunch”
• mild evidence: user sent message, “lunch” in Subject
• null evidence: email from unknown sender
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 13
Algorithm for instantiating script instances
6. Merge instances sharing same/similar “key parts
• whenEatingOccurred, whereEatingOccurred, and to a lesser
extent, whoAttended.
• why and what local properties of this script are of secondary
importance (instances of eating pizza need not be merged)
• Merge documents when:
1. “When” property is the same/close
2. “Where”/”Who” is the same if the tf-idf for the term is low.
• Merge the property fillers and score becomes: 1 − (∏s∈S0 1 − Score(s)) ,
where S0 is the set of script instances.
• Repeat merging as additional subproperties are filled.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 14
Case Study: Eating Out
• Goal: Find, among users’ personal data, instances of eating at
various restaurants.
• Three users: Alice, Bob, Charlie
• Six-month period data
• Four types of sources:
• messaging (e.g., email, Facebook messenger, Hangouts)
• calendaring (e.g. Google Calendar)
• financial transactions (e.g. bank and credit card statements)
• location services (e.g. Foursquare, Facebook checkins).
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 15
Relevant objects to the Eating_out script
Note: that the fact that an object is relevant does not mean that it indeed was part of an Eating Out event.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 16
Golden set
• The identification of the golden set a posteriori is difficult- we
cannot expect our users to accurately remember every single
instance of Eating Out.
• Every user carefully went over the six month of recorded PDT
and identified all data that pertained to Eating Out events.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 17
Evaluation Metrics
• Percentage of events retrieved: percentage of all user-
identified Eating Out events retrieved by our scripts, as a proxy
for Recall.
• Overall Precision: measured as the percentage of identified
script instances that correspond to actual Eating Out events.
• Precision@k: the percentage of top-k (based in merged
scores) script instances that correspond to actual eating out
events.
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 18
Experimental results
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 19
Experimental results
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 20
Precision@k
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 21
Alice Bob
Charlie Charlie + spouse
Precision@k
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 22
Implementation Challenges
• Evaluation
• Personal data is sensitive
• Data retrieval is complex
• IRB - privacy
• Misclassified “restaurants” in bank statements
• Use of Google Maps. Partial success
• Need for NLP analysis
• E.g., we miss "cannot make it for dinner”
• Personalization issues: each person uses PDT consistently but
very differently (e.g. shared bank accounts)
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 23
Conclusions and Future Work
• First step towards creation of a PKB for personal data
exploration
• Future work:
• Extensible approach for implementing script instantiation from PDTs.
• declarative description of scripts
• declarative description of clues/evidence
• declarative description of information to extract from each relevant PDT
• Script personalization
• Extended user experiments
• Visualization tools
5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 24
Thank you!

More Related Content

Similar to Integration and Exploration of Connected Personal Digital Traces

Teaching Data Management
Teaching Data Management Teaching Data Management
Teaching Data Management Elaine Martin
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCarly Strasser
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Unpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research ProcessUnpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research Processekhoogestraat
 
IT Operations Breakout Session
IT Operations Breakout SessionIT Operations Breakout Session
IT Operations Breakout SessionSplunk
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...James Jacobs
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeCraig Knoblock
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...ASIS&T
 
Personal Information Search and Discovery
Personal Information Search and DiscoveryPersonal Information Search and Discovery
Personal Information Search and DiscoveryAmélie Marian
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Ethnographically-inspired usability testing
Ethnographically-inspired usability testingEthnographically-inspired usability testing
Ethnographically-inspired usability testingLeah Emary
 
Text Analysis Methods for Digital Humanities
Text Analysis Methods for Digital HumanitiesText Analysis Methods for Digital Humanities
Text Analysis Methods for Digital HumanitiesHelen Bailey
 
Spreading the word
Spreading the wordSpreading the word
Spreading the wordIna Smith
 

Similar to Integration and Exploration of Connected Personal Digital Traces (20)

Teaching Data Management
Teaching Data Management Teaching Data Management
Teaching Data Management
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Unpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research ProcessUnpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research Process
 
IT Operations Breakout Session
IT Operations Breakout SessionIT Operations Breakout Session
IT Operations Breakout Session
 
Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...Preservation for all: the future of government documents and the “digital FDL...
Preservation for all: the future of government documents and the “digital FDL...
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
 
Personal Information Search and Discovery
Personal Information Search and DiscoveryPersonal Information Search and Discovery
Personal Information Search and Discovery
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Dm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_novDm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_nov
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
Ethnographically-inspired usability testing
Ethnographically-inspired usability testingEthnographically-inspired usability testing
Ethnographically-inspired usability testing
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Text Analysis Methods for Digital Humanities
Text Analysis Methods for Digital HumanitiesText Analysis Methods for Digital Humanities
Text Analysis Methods for Digital Humanities
 
Spreading the word
Spreading the wordSpreading the word
Spreading the word
 
DMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary ToolsDMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary Tools
 

More from Amélie Marian

Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Amélie Marian
 
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
Personal Information Management Systems - EDBT/ICDT'15 TutorialPersonal Information Management Systems - EDBT/ICDT'15 Tutorial
Personal Information Management Systems - EDBT/ICDT'15 TutorialAmélie Marian
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksAmélie Marian
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsAmélie Marian
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and styleAmélie Marian
 

More from Amélie Marian (6)

Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015
 
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
Personal Information Management Systems - EDBT/ICDT'15 TutorialPersonal Information Management Systems - EDBT/ICDT'15 Tutorial
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random Walks
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative Statements
 
Searching Web Forums
Searching Web ForumsSearching Web Forums
Searching Web Forums
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and style
 

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 

Integration and Exploration of Connected Personal Digital Traces

  • 1. Integration and Exploration of Connected Personal Digital Traces Valia Kalokyri, Alex Borgida, Amélie Marian, Daniela Vianna Rutgers University
  • 2. Personal data is fragmented, heterogeneous 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 2
  • 3. DigitalSelf Project: Goals 1. Integrate personal from various heterogeneous sources 2. Design of a unified and intuitive model to link and represent personal information 3. Group personal data with respect to conceptually coherent episodes – Creation of a Personal Knowledge Base 4. Search tools for digital memories 5. Design of interactive tools to provide users with narrative views of their digital memories. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 3
  • 4. PIM – Personal Information Management • Traditional PIM Systems – focus on objects relationships • Haystack • Semex • OntoPim • … • We focus on a narrative of events • Exploration of connections between events – or Personal Data Traces (PDTs) 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 4
  • 5. Background • Research in psychology: Episodic memory – memory of autobiographical events • It is the collection of past personal experiences that occurred at a particular time and place. (times, places, associated emotions, and other contextual who, what, when, where, why knowledge that can be explicitly stated/conjured) • Natural way to remember past events is by pertinent contextual information; answers to: • What, When, Where, Who, What, Why, How (w5h) • Derived from the "frame" structure of events which involve the digital documents 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 5
  • 6. Integrating Personal Data • Create an infrastructure to retrieve and store personal data • Gather content from several online services (via APIs, IMAP) • Social data - Facebook,Twitter, LinkedIn • Geolocation data - Foursquare • Email - Gmail, or any other email • Calendars - Google Calendar • Personal files - local file system, Google Drive, Dropbox • Web browsing histories - Chrome, Firefox • Apply entity resolution – who, where dimension IIWeb’14 paper, Github open source 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 6
  • 7. Contributions • High-level description of episodic scripts • Group events (PDTs) to connect them into a memory episode • Scripts: prototypical plans, “a predetermined, stereotyped sequence of actions that defines a well-known situation”. (Schank and Abelson) • Heuristic algorithm to find and combine PDTs into scripts • Case study: Eating out script • Script description • Evaluation with user data Goal: Organize & summarize PTDs into episodes Allow users to explore, understand and learn from their actions 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 7
  • 8. Grouping Data into Coherent Episodes • Provide a narrative by making connections between PDTs • Example - Going out to eat at a restaurant • Script would provide description of possible “event flows” (arrange where & when to go, make reservation, call a cab/uber, go to the restaurant, order food, [...], pay, [...], return, [...]) • Emails concerning a dinner • OpenTable reservation at a restaurant • Foursquare checkin with photos • Credit card payment Narrative for going out to a dinner 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 8
  • 9. EstablishWhereEat InitiateGoingOut EstablishWhenEat EstablishWhoEat MakeReservation <restaurant> AttendEatingOut Ontology for scripts UML activity diagram for Eating_Out 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 9
  • 10. Algorithm for instantiating script instances 1. Create a list of “trigger words/phrases”, whose occurrence indicates that a document has something to do with an instance of a particular script type. • Start with goal events/subscripts - AttendEatingOut • E.g. “Eat”, “eat out” and all their synonyms and hyponyms (Wordnet, ConceptNet5) • Consider the w5h participants of the goal event (Verbnet, Framenet) • E.g. “restaurant” is a where value of “eat” for Eating_Out • The result is a list of words to search for • E.g. breakfast, lunch, dinner, restaurant and its hyponyms etc. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 10
  • 11. Algorithm for instantiating script instances 2. All retrieved PDTs are preprocessed: • Entity extraction (Stanford nltk) • Who, Where • Time extraction-explicating/disambiguating information • E.g. tomorrow, this Wednesday, are made absolute dates • Technlogies used: Stanford ntlk, python Dateparser, our own regular expressions • Group certain kinds of documents into single individuals • E.g. Email threads, facebook messages etc 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 11
  • 12. Algorithm for instantiating script instances 3. Each individual leads to the creation of a candidate instance of the script (or one of the subscript) 4. Fill some of the script instance sub-properties • E.g. restaurant charge in a credit card bill provides evidence for the attendEatingOut subscript, with whereEatingOccurred and whenEatingOccurred and one whoAttended. • A corresponding Facebookcheckin could give information about the rest whoAttended property 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 12
  • 13. Algorithm for instantiating script instances 5. Score the instances depending on the strength of evidence it manifests for an instance. • strong evidence: • Bank statement • a long email thread mentioning keywords many times and the user participating a lot in the email exchange • weak evidence: • A single email mentioning the word “lunch” • mild evidence: user sent message, “lunch” in Subject • null evidence: email from unknown sender 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 13
  • 14. Algorithm for instantiating script instances 6. Merge instances sharing same/similar “key parts • whenEatingOccurred, whereEatingOccurred, and to a lesser extent, whoAttended. • why and what local properties of this script are of secondary importance (instances of eating pizza need not be merged) • Merge documents when: 1. “When” property is the same/close 2. “Where”/”Who” is the same if the tf-idf for the term is low. • Merge the property fillers and score becomes: 1 − (∏s∈S0 1 − Score(s)) , where S0 is the set of script instances. • Repeat merging as additional subproperties are filled. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 14
  • 15. Case Study: Eating Out • Goal: Find, among users’ personal data, instances of eating at various restaurants. • Three users: Alice, Bob, Charlie • Six-month period data • Four types of sources: • messaging (e.g., email, Facebook messenger, Hangouts) • calendaring (e.g. Google Calendar) • financial transactions (e.g. bank and credit card statements) • location services (e.g. Foursquare, Facebook checkins). 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 15
  • 16. Relevant objects to the Eating_out script Note: that the fact that an object is relevant does not mean that it indeed was part of an Eating Out event. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 16
  • 17. Golden set • The identification of the golden set a posteriori is difficult- we cannot expect our users to accurately remember every single instance of Eating Out. • Every user carefully went over the six month of recorded PDT and identified all data that pertained to Eating Out events. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 17
  • 18. Evaluation Metrics • Percentage of events retrieved: percentage of all user- identified Eating Out events retrieved by our scripts, as a proxy for Recall. • Overall Precision: measured as the percentage of identified script instances that correspond to actual Eating Out events. • Precision@k: the percentage of top-k (based in merged scores) script instances that correspond to actual eating out events. 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 18
  • 19. Experimental results 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 19
  • 20. Experimental results 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 20
  • 21. Precision@k 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 21 Alice Bob Charlie Charlie + spouse
  • 22. Precision@k 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 22
  • 23. Implementation Challenges • Evaluation • Personal data is sensitive • Data retrieval is complex • IRB - privacy • Misclassified “restaurants” in bank statements • Use of Google Maps. Partial success • Need for NLP analysis • E.g., we miss "cannot make it for dinner” • Personalization issues: each person uses PDT consistently but very differently (e.g. shared bank accounts) 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 23
  • 24. Conclusions and Future Work • First step towards creation of a PKB for personal data exploration • Future work: • Extensible approach for implementing script instantiation from PDTs. • declarative description of scripts • declarative description of clues/evidence • declarative description of information to extract from each relevant PDT • Script personalization • Extended user experiments • Visualization tools 5/19/17 Amélie Marian - Rutgers University - ExploreDB'17 24

Editor's Notes

  1. Digital data is inherently contextual due to various forms of metadata.
  2. Idea of narrative is supported by the notion of “episodic memory” (Tulving, 2002)
  3. As proof of concept, we implemented our scripts for the Eating Out scenario. Performing experiments on Personal Data is not a trivial en- deavor due to the sensitive nature of the data and the diffi- culty in getting personal data sets for research purposes. Mint.com is a free, web-based personal financial management service
  4. Relevance was computed using: -keyword based scoring for Emails/Messaging, Calendar -metadata categories stored with the original data items for Financial and location data *Verified and corrected information by using the Google Maps API. ---Alice may have discussed a restaurant in messages with friends but not gone there, or Charlie may have bought food at a business categorized both as a supermarket and a restaurant. The 3 users have very different patterns-expected due to the highly individual nature of user behavior. Charlie shares a credit card account with her spouse, there- fore some of the 125 relevant financial data objects are not from her credit card (only 49 are)
  5. To evaluate the quality of the memory retieval process using our scripts, we need to identify all the instances of Eating Out for each user, aka a golden set. Without a perfect golden set, we cannot accurately evaluate Recall.
  6. Shows the percentage of identified events retrieved by our script for our three users. A first observation is that the results clearly reflect the different behavior of the three users. Alice and Bob use email/messaging to make restaurant plans in a majority of cases, but do not always have a financial record of the transaction. In contrast Charlie makes very few plans by email/messaging nor does she enter them in her calendar, but most of her outings result in financial transactions. Results show that not only looking at several sources of information to identify script instances for a given user is critical to identify user script instances, as the percentage of events retrieved increases with the number of sources considered; but also that any approach to retrieve user memories of events must consider several sources to adapt to the wide variety of user behaviors.
  7. The quality of the information given by different sources vary Financial data tend to be of high quality. (ordered takeout or bought groceries at a business doubling as a restaurant;->FP) email/messaging data, which depends on keyword matching for relevance, tend to be of lower quality. Need for merging information from multiple sources of personal data to improve the identification of script instances considering a variety of Personal Information sources to account for the different individual behavior of users.
  8. However, retrieval systems typically return results in a ranked order, and users are expecting the first few results to be the most relevant. Alice: financial data is of very high quality, it only exists for 67% of her Eating Out events. By combining Email/Messaging and Financial data information, she is able to identify her Eating Out events with high accuracy for all values of k. Charlie: similar pattern, lower accuracy for Email/Messaging Bob: financial data is not as accurate as expected-> categorization provided by financial provider is inaccurate in several of his transactions.
  9. when information from multiple sources is combined, the precision, especially for low values of k, which are the first instances returned to the users, is higher than that of considering sources individually