Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Personal	
  Informa-on	
  	
  
Management	
  Systems	
  
Serge	
  Abiteboul	
  
INRIA	
  &	
  ENS	
  Cachan	
  
serge.abit...
Personal	
  data	
  is	
  everywhere	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   2	
  
Personal	
  data	
  is	
  exploding	
  	
  
•  Ac-vely:	
  Data	
  and	
  metadata	
  we	
  produce	
  
–  Pictures,	
  re...
Personal	
  data	
  is	
  heterogeneous	
  
•  Structured:	
  rela-onal	
  
•  Semistructured:	
  HTML,	
  XML,	
  Jason…	...
•  Loss	
  of	
  func-onali-es	
  because	
  of	
  fragmenta-on	
  
–  You	
  don’t	
  know	
  where	
  your	
  data	
  is...
Alterna-ves	
  
1.  Con-nue	
  with	
  this	
  increasing	
   	
   	
   	
   	
   	
  
	
  mess	
  
–  Use	
  a	
  shrink	...
The	
  -me	
  for	
  PIMS	
  is	
  now!	
  
A	
  memex	
  is	
  a	
  device	
  in	
  which	
  an	
  individual	
  stores	
...
The	
  PIMS:	
  A	
  change	
  in	
  paradigm	
  
Using	
  Web	
  services	
  today	
  
•  Your	
  data	
  
•  Running	
  ...
PIMS	
  in	
  the	
  Past	
  
Saving	
  Personal	
  Data	
  –	
  Old	
  School	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   10	
  
Searching	
  Personal	
  Data	
  –	
  Old	
  School…	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   11	
  File	...
Personal	
  Informa-on	
  Management	
  –	
  
the	
  Digital	
  Age	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	...
First-­‐genera-on	
  Personal	
  Informa-on	
  
Management	
  Systems	
  
•  Storage	
  
– Archival,	
  safe-­‐keeping	
  ...
Desktop	
  Search	
  Tools	
  
•  Google	
  Desktop	
  Search	
  (defunct)	
  
•  Apple	
  Spotlight	
  
•  Windows	
  Sea...
Past	
  PIMS	
  projects	
  	
  
(late	
  1990’s,	
  2000’s)	
  	
  
•  Lifestreams	
  
–  Time	
  oriented	
  streams	
  ...
LifeStreams	
  
(Freeman	
  and	
  Gelertner,	
  Yale,	
  1996-­‐1997)	
  
Help	
  users	
  
manage	
  their	
  
informa-o...
Haystack	
  	
  
(Karger	
  et	
  al.,	
  MIT	
  CSAIL	
  1997-­‐2005)	
  
Allows	
  users	
  to	
  store,	
  
examine	
  ...
Stuff	
  I’ve	
  Seen	
  	
  
(Dumais	
  et	
  al.	
  Microsos,	
  2003-­‐2004)	
  
•  Unified	
  Index	
  
•  Integra-on	
 ...
A	
  changing	
  landscape	
  
	
  
	
  
Cloud-­‐based	
  model	
  	
  
	
  
	
  
Heterogeneous	
  data	
  types	
  and	
 ...
A	
  vision	
  for	
  the	
  Future	
  of	
  PIMS	
  	
  
	
  
All	
  the	
  digital	
  life	
  of	
  an	
  individual	
  
From	
  Memex	
  to	
  MyLifeBits	
  
Memex	
  
–  Memory	
  i...
Some	
  of	
  the	
  digital	
  life?	
  
•  The	
  “Total	
  Capture	
  vision”	
  has	
  its	
  detractors	
  
•  Advant...
Hypermnesia	
  
Excep7onally	
  exact	
  or	
  vivid	
  memory,	
  
especially	
  as	
  associated	
  with	
  
certain	
  ...
Nature	
  and	
  value	
  of	
  informa-on	
  
	
  
w5h	
  model	
  (context-­‐based)	
  
	
  
	
  
•  Changes	
  with	
  ...
Storage	
  and	
  Archival	
  	
  
•  Fully	
  under	
  user’s	
  control	
  
•  Fully	
  available	
  on	
  the	
  cloud	...
Data	
  integra-on	
  
•  Old	
  problems	
  revisited	
  
Person-­‐centric	
  	
  
informa-on	
  integra-on	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   27	
  
27	
  
...
Classical	
  data	
  integra-on	
  problem	
  
•  Choose	
  a	
  schema	
  for	
  the	
  
PIMS	
  
•  Choose	
  a	
  mappi...
Classical	
  knowledge	
  integra-on	
  problem	
  
•  Enrich	
  the	
  ontology	
  
–  Align	
  concepts	
  and	
  rela-o...
Illustra-on:	
  en-ty	
  resolu-on	
  
•  Mail	
   •  Contact	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   30...
Searching	
  Personal	
  Informa-on	
  
Memory	
  Tasks	
  
•  The	
  “five	
  Rs”	
  memory	
  tasks	
  	
  
	
   	
   	
  -­‐Sellen	
  and	
  Whitaker,	
  CACM	
...
Recollec-ng	
  
•  Task-­‐based	
  memory	
  process	
  
•  Retracing	
  steps	
  to	
  recollect	
  informa-on	
  
– “Whe...
Reminiscing	
  
•  Browsing	
  through	
  past	
  
memories	
  to	
  re-­‐live	
  
them	
  
•  Experience-­‐based	
  (no	
...
Retrieving	
  
•  Retrieving	
  specific	
  informa-on	
  
– Files,	
  documents,	
  pictures	
  
– Data	
  snippets	
  
• ...
Reflec-ng	
  
•  Learning	
  from	
  the	
  past	
  
– Iden-fy	
  paxerns	
  
– Personal	
  data	
  analysis	
  
•  Towards...
Remembering	
  Inten-ons	
  
•  Focus	
  on	
  prospec-ve	
  memory	
  
–  To-­‐do	
  lists	
  
–  Appointment	
  reminder...
Explaining	
  
•  Users	
  want	
  to	
  understand	
  the	
  informaJon	
  
	
  they	
  see,	
  the	
  answers	
  they	
 ...
Serendipity	
  
•  You	
  may	
  hear	
  by	
  chance	
  a	
  
song	
  that	
  is	
  going	
  to	
  totally	
  
obsess	
  ...
Answer	
  Personaliza-on	
  
•  Modifying	
  the	
  query	
  based	
  on	
  the	
  user’s	
  
ontology	
  and	
  preferenc...
Rich	
  search/queries	
  
Context-­‐aware	
  
•  We	
  remember	
  our	
  data	
  based	
  on	
  
contextual	
  cues	
  	...
Digital	
  Self	
  Architecture	
  @	
  Rutgers	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   42	
  
Architect...
Personal	
  data	
  analy-cs	
  
Aka	
  Small	
  data	
  
Elliox	
  Hedman,	
  Design	
  Research	
  Conference
Personal	
  data	
  analy-cs	
  
•  Rela-vely	
  new	
  topic	
  	
  
–  First	
  Interna7onal	
  Workshop	
  on	
  Person...
Focus:	
  Quan-fied	
  self	
  
•  From	
  sensors	
  &	
  all	
  kind	
  of	
  data	
  
•  Health	
  and	
  well	
  being	...
Towards	
  a	
  Personal	
  Knowledge	
  Base	
  
•  Combine	
  informa-on	
  from	
  different	
  sources	
  to	
  
infer	...
Access	
  control	
  and	
  security	
  
Is	
  privacy	
  needed?	
  
•  Because	
  young	
  people	
  expose	
  personal	
  life	
  online	
  more	
  likely	
  
t...
Different	
  architectures	
  
•  Connec-on	
  with	
  vendors	
  (same	
  
for	
  other	
  services)	
  
•  Secure	
  P2P	...
Secure	
  devices	
  
Amélie	
  &	
  Serge,	
  EDBT,	
  11111011111	
  	
   50	
  
•  Secure	
  portable	
  tokens:	
  Sec...
Reducing	
  or	
  increasing	
  the	
  security	
  risk?	
  
•  An	
  intrusion	
  on	
  my	
  PIMS	
  puts	
  all	
  my	
...
Other	
  issues	
  
•  Self	
  administra-on	
  	
  
•  Synchroniza-on	
  and	
  task	
  sequencing	
  
•  Internet	
  of	...
Support	
  for	
  system	
  administra-on	
  
•  It	
  should	
  require	
  epsilon	
  competence	
  
–  Users	
  are	
  o...
Synchroniza-on	
  and	
  	
  
task	
  sequencing	
  across	
  devices	
  
•  Many	
  possible	
  approaches	
  
•  Service...
A	
  hub	
  for	
  the	
  IoT	
  
•  Internet	
  of	
  things:	
  Interconnec-on	
  of	
  iden-fiable	
  
compu-ng	
  devic...
Conclusion:	
  The	
  PIMS	
  are	
  arriving	
  
For	
  societal,	
  technical,	
  industrial	
  reasons	
  
They	
  will...
Society	
  is	
  ready	
  to	
  move	
  
•  Growing	
  resentment	
  	
  
–  Against	
  companies:	
  intrusive	
  marke-n...
Society	
  is	
  ready	
  to	
  move	
  (2)	
  
•  Privacy	
  control:	
  regula-ons	
  in	
  Europe	
  
•  Informa-on	
  ...
Technology	
  is	
  gearing	
  up	
  
•  System	
  administra-on	
  is	
  easier	
  
–  Abstrac-on	
  technologies	
  for	...
Technology	
  is	
  gearing	
  up	
  (2)	
  
•  Many	
  systems	
  &	
  projects	
  
–  Lifestreams,	
  Stuff-­‐I’ve-­‐Seen...
Industry	
  is	
  interested	
  
	
  (1)	
  Pre-­‐digital	
  companies	
  
•  E.g.,	
  hotels	
  or	
  banks	
  	
  
•  Di...
Industry	
  is	
  interested	
  
	
  (2)	
  Home	
  appliances	
  companies	
  
•  Many	
  boxes	
  deployed	
  at	
  home...
Industry	
  is	
  interested	
  
	
  (3)	
  Pure	
  Internet	
  players	
  
•  Amazon:	
  great	
  know-­‐how	
  in	
  pro...
They	
  will	
  change	
  our	
  lives:	
  	
  
(1)	
  rebalance	
  the	
  Web	
  	
  
•  User	
  control	
  over	
  their...
They	
  will	
  change	
  our	
  lives:	
  	
  
(2)	
  new	
  func-onali-es	
  
1.  Data	
  integra-on	
  
2.  Search	
  a...
(3)	
  So	
  watch	
  out	
  for	
  the	
  killer	
  apps	
  
•  Personal	
  assistant	
  
–  Google	
  now	
  enhanced	
 ...
Come	
  and	
  share	
  PIMS	
  
•  Lots	
  of	
  cool	
  problems	
  
•  Lots	
  of	
  opportuni-es	
  for	
  
your	
  fa...
References	
  
Data	
  IntegraJon:	
  
•  A	
  survey	
  of	
  approaches	
  to	
  automa7c	
  schema	
  matching,	
  Rahm...
References	
  
Data	
  extracJon	
  
•  A	
  tool	
  for	
  personal	
  data	
  extrac7on.	
  D.	
  Vianna,	
  A.-­‐M.	
  ...
References	
  
PIMS:	
  
•  As	
  we	
  may	
  think,	
  Vannevar	
  Bush,	
  the	
  Atlan-c	
  Monthly,	
  2005.	
  
•  P...
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
Upcoming SlideShare
Loading in …5
×

Personal Information Management Systems - EDBT/ICDT'15 Tutorial

3,441 views

Published on

Personal Information Management Systems - EDBT/ICDT'15 Tutorial

Published in: Technology
  • I have a kind of similar projet but one different foundation is that it's not cloud based (actually, I want it to work without Internet - a fully distributed information system that is entity centric). I hope we may talk of the subject.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Excellent work. I've been interested in these subject from a long time (I'm associated professor in system engineering in Tarbes).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Personal Information Management Systems - EDBT/ICDT'15 Tutorial

  1. 1. Personal  Informa-on     Management  Systems   Serge  Abiteboul   INRIA  &  ENS  Cachan   serge.abiteboul@inria.fr   Amélie  Marian   Rutgers  University   amelie@cs.rutgers.edu  
  2. 2. Personal  data  is  everywhere   Amélie  &  Serge,  EDBT,  11111011111     2  
  3. 3. Personal  data  is  exploding     •  Ac-vely:  Data  and  metadata  we  produce   –  Pictures,  reports,  emails,  calendars,  tweets,  annota-ons,   recommenda-on,  social  network…              Ac-vely:  Data  we  like/buy   –  Books,  music,  movies…   •  Passively:  Data  others  produce  about  us   –  Public  administra-on,  schools,  insurances,  banks…   –  Amazon,  banks,  retailers,  applestore…     •  Stealthily:  sensors   –  GPS,  web  naviga-on,  phone,  "quan-fied  self"  measurements,   contactless  card  readings,  surveillance  camera  pictures…   •  Stealthily:  data  analysis   –  Clicks,  Searches,  TV  viewing  habits  (e.g.,  NeYlix)   –  NSA  inference   3  Amélie  &  Serge,  EDBT,  11111011111    
  4. 4. Personal  data  is  heterogeneous   •  Structured:  rela-onal   •  Semistructured:  HTML,  XML,  Jason…   •  Not  structured:  text  (pdf),  pictures,  music,  video…   •  Metadata:  date,  loca-on…     •  Seman-c:  RDF,  RDFS,  Owl   •  Different  languages,  terminologies,  ontologies,  structures   •  Different  systems,  protocols     •  Varying  quality   4  Amélie  &  Serge,  EDBT,  11111011111    
  5. 5. •  Loss  of  func-onali-es  because  of  fragmenta-on   –  You  don’t  know  where  your  data  is,  how  to  maintain  it  up   to  date,  how  to  get  it  some-mes   –  Difficult  to  do  global  search,  maintenance,   synchroniza-on,  archiving...   •  Loss  of  control  over  the  data   –  Difficult  to  control  privacy   –  Difficult  to  control  sharing     –  Leaks  of  private  informa-on   •  Loss  of  freedom   –  Vendor  lock-­‐in   Bad  news   5  Amélie  &  Serge,  EDBT,  11111011111    
  6. 6. Alterna-ves   1.  Con-nue  with  this  increasing              mess   –  Use  a  shrink  to  overcome                    the  frustra-on   2.  Regroup  all  your  data  on  the  same  plaYorm   –  Google,  Apple,  Facebook,  …,  a  new  comer   –  Use  a  shrink  to  overcome  resentment   3.  Study  2  years  to  become  a  geek   –  Geeks  know  how  to  manage  their  informa-on     –  Use  a  shrink  to  survive  the  experience   6  Amélie  &  Serge,  EDBT,  11111011111     Where  do  you   keep  your  data?  
  7. 7. The  -me  for  PIMS  is  now!   A  memex  is  a  device  in  which  an  individual  stores  all  his  books,  records,  and   communica7ons,  and  which  is  mechanized  so  that  it  may  be  consulted  with   exceeding  speed  and  flexibility.  It  is  an  enlarged  in7mate  supplement  to  his   memory.          Vannevar  Bush,  The  Atlan-c  Monthly,  1945     Defini-on  for  this  talk:  a  Personal  Informa-on   Management  System  is  a  (cloud)  system  that  manages   all  the  informa7on  of  a  person   Amélie  &  Serge,  EDBT,  11111011111     7  
  8. 8. The  PIMS:  A  change  in  paradigm   Using  Web  services  today   •  Your  data   •  Running  with  an  external   service   •  On  some  unknown   machines     Your  PIMS     •  Your  data     •  Running  a  local  service   •  On  your  machine   Possibly  for  external  services   •  A  replica  of  the  data   •  On  a  wrapper     •  On  your  machine   Amélie  &  Serge,  EDBT,  11111011111     8  
  9. 9. PIMS  in  the  Past  
  10. 10. Saving  Personal  Data  –  Old  School   Amélie  &  Serge,  EDBT,  11111011111     10  
  11. 11. Searching  Personal  Data  –  Old  School…   Amélie  &  Serge,  EDBT,  11111011111     11  File  cabinet   around  1888  
  12. 12. Personal  Informa-on  Management  –   the  Digital  Age   Amélie  &  Serge,  EDBT,  11111011111     12   % grep PIMS /home/amelie/presentations
  13. 13. First-­‐genera-on  Personal  Informa-on   Management  Systems   •  Storage   – Archival,  safe-­‐keeping   •  Organiza-on   – Structure   – Different  file  types   •  Finding  and  re-­‐finding  informa-on   – Different  from  tradi-onal  IR/Web  search  systems   – Keyword  searches  not  ideal   Amélie  &  Serge,  EDBT,  11111011111     13  
  14. 14. Desktop  Search  Tools   •  Google  Desktop  Search  (defunct)   •  Apple  Spotlight   •  Windows  Search   •  Lead  to  frustra-on  when  users  cannot  find   informa-on  they  know  they  have   Amélie  &  Serge,  EDBT,  11111011111     14   Use  IR-­‐style  keyword  searches     Some  metadata  filtering  
  15. 15. Past  PIMS  projects     (late  1990’s,  2000’s)     •  Lifestreams   –  Time  oriented  streams   •  Haystack   –  Uniform  data  model   •  Stuff  I’ve  seen   –  History  of  web  behavior   •  Dataspaces   –  Seman-c  connec-ons.  Data   integra-on   •  Connec-ons,  Seetrieve   –  Task-­‐based  organiza-on   •  deskWeb   –  Looks  at  the  social  network   graph         Various  use  of     –  Context   –  Time   –  Social  network     Amélie  &  Serge,  EDBT,  11111011111     15  
  16. 16. LifeStreams   (Freeman  and  Gelertner,  Yale,  1996-­‐1997)   Help  users   manage  their   informa-on     Time-­‐centric  view   of  documents   Amélie  &  Serge,  EDBT,  11111011111     16  
  17. 17. Haystack     (Karger  et  al.,  MIT  CSAIL  1997-­‐2005)   Allows  users  to  store,   examine  and  manipulate   their  informa-on     •  Uniform  Data  Model   •  Semi-­‐structured  Data   •  Captures   rela-onships   •  Separate  Workspaces   Amélie  &  Serge,  EDBT,  11111011111     17  
  18. 18. Stuff  I’ve  Seen     (Dumais  et  al.  Microsos,  2003-­‐2004)   •  Unified  Index   •  Integra-on  of   sources   •  Re-­‐find   informa-on   •  Focus  on  UI   Amélie  &  Serge,  EDBT,  11111011111     18  
  19. 19. A  changing  landscape       Cloud-­‐based  model         Heterogeneous  data  types  and  formats     Need  for  richer  func-onali-es  and  seman-c  analysis   Amélie  &  Serge,  EDBT,  11111011111     19  
  20. 20. A  vision  for  the  Future  of  PIMS      
  21. 21. All  the  digital  life  of  an  individual   From  Memex  to  MyLifeBits   Memex   –  Memory  index  or  memory  extender   –  Hypertext  system  by  Vannevar  Bush  in  1945     –  Compress  and  store  all  of  their  books,  records,   and  communica-ons…   –  Provide  an  "enlarged  in-mate  supplement  to   one's  memory”   MyLifeBits   –  Microsos  Research  project  with  Gordon  Bell   (2006)   –  Life-­‐logging     –  All  documents  read  or  produced  by  Bell,  CDs,      emails,  web  pages  browsed,  phone  and        instant  messaging  conversa-ons,  etc.       Amélie  &  Serge,  EDBT,  11111011111     21  
  22. 22. Some  of  the  digital  life?   •  The  “Total  Capture  vision”  has  its  detractors   •  Advantages  of  selec-ve  human  memory     – Ignore  irrelevant  informa-on  to  avoid  flooding   when  searching  for  something   – Choose  what  to  forget,  e.g.,  unpleasant  memories   •  Perhaps  PIMS  should  also  be  selec-ve   •  More  complicated  than  Total  Capture   Amélie  &  Serge,  EDBT,  11111011111     22  
  23. 23. Hypermnesia   Excep7onally  exact  or  vivid  memory,   especially  as  associated  with   certain  mental  illnesses   For  a  user:  We  cannot  live  knowing   that  any  word,  any  move  will  leave   a  trace?     For  the  ecosystem:  We  cannot  store   all  the  data  we  produce  –  lack  of   storage  resources       23   ForgeGng  is  Key  to  a  Healthy  Mind   Scien7fic  American   Image:  Aaron  Goodman   A  main  issue  is  to  select  the  informaJon  we   choose  to  keep     Amélie  &  Serge,  EDBT,  11111011111    
  24. 24. Nature  and  value  of  informa-on     w5h  model  (context-­‐based)       •  Changes  with  -me   •  Depends  on  many  dimensions:   nature  of  info,  rarity,  age,   personal  bias/taste/opinions…   •  Difficult  to  es-mate  the  cost  to   get  some  info   –  To  es-mate  how  much  you  would   spend  before  you  give  up   •  Difficult  to  es-mate  the  value  of   informa-on  you  don't  have  yet   •  Difficult  for  the  system  to  know   what  a  human  remembers   –  Makes  crowd  sourcing  difficult   Amélie  &  Serge,  EDBT,  11111011111     24  
  25. 25. Storage  and  Archival     •  Fully  under  user’s  control   •  Fully  available  on  the  cloud   –  Without  privacy  risk   •  Fully  resilient  to  failure   –  Automa-c  back-­‐ups   –  Automa-c  synchroniza-on  with  other  systems/devices     •  Support  of  access  control   –  Simple  and  intui-ve  defini-on  across  systems/devices   •  Use  of  encryp-on   –  Data  is  stored  encrypted  in  the  cloud  or  on  a  personal   machine  connected  to  the  cloud   Amélie  &  Serge,  EDBT,  11111011111     25  
  26. 26. Data  integra-on   •  Old  problems  revisited  
  27. 27. Person-­‐centric     informa-on  integra-on   Amélie  &  Serge,  EDBT,  11111011111     27   27   Sue’s   PIMS   …   …   W’1        W1     wrapper   …   Secured   net   Bob   Joe   …   Decentralized  services   (e.g.,  Diaspora)     External   Services   (e.g.,  Facebook)          Wn   wrapper        L1   Lp          D1   Dm     W’n   Local   Services   (e.g.,  Analy-cs)     Sue   S   Server-­‐centric   …  
  28. 28. Classical  data  integra-on  problem   •  Choose  a  schema  for  the   PIMS   •  Choose  a  mapping   between  the  sources  and   the  mediated  schema   •  Extract  &  load  &  maintain   –  Data  and  metada  from   sources   Lots  of  works   –  On  digital  libraries   –  On  database  integra-on   Amélie  &  Serge,  EDBT,  11111011111     28   …   Sue’s   PIMS   Sn   Sn   S1   S1   Wrapper   Wrapper  
  29. 29. Classical  knowledge  integra-on  problem   •  Enrich  the  ontology   –  Align  concepts  and  rela-ons  in   schemas   –  Align  objects   •  Reference  to  external  data     Lots  of  works   –  On  knowledge  representa-on     –  On  knowledge  integra-on     Amélie  &  Serge,  EDBT,  11111011111     29   Imported  knowledge   Alignments   (computed  or  curated)   Curated     knowledge   Imported  ontologies   Personal   ontology  
  30. 30. Illustra-on:  en-ty  resolu-on   •  Mail   •  Contact   Amélie  &  Serge,  EDBT,  11111011111     30   •  Websearch   amelie@gmail.com   Amelie   Marian   from   …        Nikki  de  Saint-­‐Phalle  …  body   grandpalais.fr/ndsp/  url   …    Nikki  de  Saint-­‐Phalle    …  
  31. 31. Searching  Personal  Informa-on  
  32. 32. Memory  Tasks   •  The  “five  Rs”  memory  tasks          -­‐Sellen  and  Whitaker,  CACM  2010   Recollec-ng   Reminiscing   Retrieving   Reflec-ng   Remembering  inten-ons   Amélie  &  Serge,  EDBT,  11111011111     32  
  33. 33. Recollec-ng   •  Task-­‐based  memory  process   •  Retracing  steps  to  recollect  informa-on   – “Where  did  I  leave  my  keys”     – “When  was  the  last  -me  I  saw  Pierre”   •  Follow  a  series  of  cues  to  iden-fy  informa-on   Amélie  &  Serge,  EDBT,  11111011111     33   Need:  ConnecJons  between  memory  objects      (integraJon  and  navigaJon)  
  34. 34. Reminiscing   •  Browsing  through  past   memories  to  re-­‐live   them   •  Experience-­‐based  (no   specific  goal  in  mind)   –   E.g.,  looking  at  old   photos   Amélie  &  Serge,  EDBT,  11111011111     34   Need:  ConnecJons  between  memory  objects      (integraJon  and  navigaJon)  
  35. 35. Retrieving   •  Retrieving  specific  informa-on   – Files,  documents,  pictures   – Data  snippets   •  Use  of  metadata   •  Can  be  combined  with  recollec-on   Amélie  &  Serge,  EDBT,  11111011111     35   Need:  Query  model,  Indexes,        and  Search  algorithms        
  36. 36. Reflec-ng   •  Learning  from  the  past   – Iden-fy  paxerns   – Personal  data  analysis   •  Towards  a  Personal  Knowledge  Base  (PKB)   – Individual  vs.  shared  knowledge   – Privacy  concerns   Amélie  &  Serge,  EDBT,  11111011111     36   Need:  Knowledge  Discovery  and  Mining   techniques  designed  for  personal  data        
  37. 37. Remembering  Inten-ons   •  Focus  on  prospec-ve  memory   –  To-­‐do  lists   –  Appointment  reminders   •  Ac-ve  focus  of  commercial  companies   –  Google  Now   –  No-fica-on  apps  (-me-­‐  or  loca-on-­‐based)   –  Microsos  Personal  Agent  project?   Amélie  &  Serge,  EDBT,  11111011111     37   Need:  NLP  techniques  designed  for                    personal  data        
  38. 38. Explaining   •  Users  want  to  understand  the  informaJon    they  see,  the  answers  they  are  given   –  In  their  professional/social  life     •  Difficul-es   – Reasoning  with  large  number  of  facts     – Informa-on  is  osen  probabilis-c  and  not  public   – Requires  knowing  how  the  informa-on  was   obtained  (its  provenance)   38  Amélie  &  Serge,  EDBT,  11111011111    
  39. 39. Serendipity   •  You  may  hear  by  chance  a   song  that  is  going  to  totally   obsess  you   •  A  librarian  may  suggest   your  reading  a  book  that   will  change  your  life   This  is  serendipity   •  A  perfect  search  engine     •  A  perfect  recommenda-on   system   •  A  perfect  computer  assistant   Such  systems  are  boring       They  lack  serendipity   39   Design  programs  that  would  help  introduce   serendipity  in  our  lives   Amélie  &  Serge,  EDBT,  11111011111    
  40. 40. Answer  Personaliza-on   •  Modifying  the  query  based  on  the  user’s   ontology  and  preferences   •  Ranking  the  result  based  on  the  user’s   preferences   •  Example:  How  do  I  get  to  Alice’s  place?   –  Modify   •  Alice  is  Alice.Doe@gmail.com     –  Rank   •  Choose  to  bike  if  possible  (user’s  preference  if  the  weather   is  nice)   •  Choose  the  route  by  the  river  if  it  is  open   Amélie  &  Serge,  EDBT,  11111011111     40  
  41. 41. Rich  search/queries   Context-­‐aware   •  We  remember  our  data  based  on   contextual  cues     •  Personal  informa-on  is  rich  in   contextual  informa-on   –  Metadata   –  Applica-on  data     –  Environment  knowledge     •  Cogni-ve  Psychology   –  contextual  cues  are  strong   triggers  for  autobiographical   memories     InteracJve   -  I  am  looking  for  a  great  movie  I   saw  about  a  month  ago   -  Was  it  on  TV?   -  No  in  a  theater.   -  Was  it  Turkish?   -  Yes.   -  It  must  be  Winter  Sleep.   Amélie  &  Serge,  EDBT,  11111011111     41  
  42. 42. Digital  Self  Architecture  @  Rutgers   Amélie  &  Serge,  EDBT,  11111011111     42   Architecture •  Data  CollecJon   –  Iden-fica-on,  retrieval,  storage     –   Personal  Extrac-on  Tool:     hxps://github.com/ameliemarian/DigitalSelf   •  Data  IntegraJon   –  Mul-dimensional,  context-­‐ aware,  unified  data  model   –  w5h  Model     •  Search     –  based  on  the  natural  memory   retrieval  process   –  Context-­‐aware,  approximate   –  -­‐w5h  Search     •  Knowledge  Discovery   –  Find  connec-ons  and  paxerns   –  Integrates  user  behavior  and   feedback  
  43. 43. Personal  data  analy-cs   Aka  Small  data   Elliox  Hedman,  Design  Research  Conference
  44. 44. Personal  data  analy-cs   •  Rela-vely  new  topic     –  First  Interna7onal  Workshop  on  Personal  Data  Analy-cs  in   the  Internet  of  Things  in  2014   •  Learn  from  personal  data  and  predic-ons   –  Personal  health  and  well-­‐being   –  Personal  transporta-on     –  Home  automa-on   •  Issues   –  Data  privacy   –  Complexity  of  “small”  data  analy-cs:  Less  is  harder   –  Combine  with  ver-cal  analy-cs:  large  groups  of  people   –  Varying  data  quality:  imprecision,  inconsistencies     Amélie  &  Serge,  EDBT,  11111011111     44  
  45. 45. Focus:  Quan-fied  self   •  From  sensors  &  all  kind  of  data   •  Health  and  well  being  model  of  the  person   •  Provide  alerts  and  counseling   •  Monitoring  and  support  for  pa-ents  with   chronic  condi-ons   •  Preven-ve  medicine   •  Ac-ve  par-cipa-on  of  the  person   •  Large-­‐scale  learning  –  privacy  issues   Amélie  &  Serge,  EDBT,  11111011111     45  
  46. 46. Towards  a  Personal  Knowledge  Base   •  Combine  informa-on  from  different  sources  to   infer  facts   –  Personal  Facts   –  Personal  Rules   –  Personal  Ontology     •  Example  Query  «  When  was  the  last  -me  I  was  in   Brussels?  »   •  Can  use  exis-ng  tools,  RDF,  RDFS,  SPARQL   Amélie  &  Serge,  EDBT,  11111011111     46  
  47. 47. Access  control  and  security  
  48. 48. Is  privacy  needed?   •  Because  young  people  expose  personal  life  online  more  likely   than  adults,  privacy  is  no  longer  the  social  norm  (M.   Zuckerberg)   •  Proved  totally  wrong   –  E.g.,  young  turn  to  ephemeral  communica-on  means  (Snapchat)   •  Privacy  paradox:  Internet  users  are  concerned  about  privacy   but  mostly  ignore  it  in  their  behaviors   Amélie  &  Serge,  EDBT,  11111011111     48  
  49. 49. Different  architectures   •  Connec-on  with  vendors  (same   for  other  services)   •  Secure  P2P   Amélie  &  Serge,  EDBT,  11111011111     49   PIMS   Vendor  rela-on   system   V1   V2   V3   PIMS   Trusted   intermediary   V1   V2   V3   Two-­‐-er   Three-­‐-er   Distributed   network   (P2P)     Secure   hardware  (e.g.,   FreedomBox)  
  50. 50. Secure  devices   Amélie  &  Serge,  EDBT,  11111011111     50   •  Secure  portable  tokens:  Secure  MCU  +  Flash  storage   –  Issues:  limita-ons  of  the  device   –  Example:  personal  medical  folder   •  Works  of  [Anciaux,Pucheral]    
  51. 51. Reducing  or  increasing  the  security  risk?   •  An  intrusion  on  my  PIMS  puts  all  my  informa-on  at  risk   •  Hard  to  be  riskier  than  today’s  model   –  Hardly  comfor-ng   •  The  PIMS  is  ran  by  a  professional  operator   –  Security/privacy  is  guaranteed  by  contract   –  Applica-ons  codes  are  verified  by  the  operator   –  The  PIMS  monitors  the  user’s  ac-ons  to  prevent  security   viola-ons   •  Data  of  different  users  are  isolated   –  Less  temp-ng  for  pirates   •  The  PIMS  does  not  solve  the  security  issues   •  It  provides  a  beXer  environment  to  address  them   Amélie  &  Serge,  EDBT,  11111011111     51  
  52. 52. Other  issues   •  Self  administra-on     •  Synchroniza-on  and  task  sequencing   •  Internet  of  things  
  53. 53. Support  for  system  administra-on   •  It  should  require  epsilon  competence   –  Users  are  osen  incompetent  and  in  par-cular  understand  lixle  about   access  control/security   •  It  should  be  epsilon  work   –  Users  are  not  interested   •  The  PIMS  helps   •  Administrate  external  applica-ons   •  Synchronize/backup  data     •  Select  services  and  op-ons   •  Manage  access  rights   –  Works  on  self-­‐tuning  systems/databases   –  Need  for  works  on  automa-cally  genera-ng  access  control  policies   from  behavior  of  users     Amélie  &  Serge,  EDBT,  11111011111     53  
  54. 54. Synchroniza-on  and     task  sequencing  across  devices   •  Many  possible  approaches   •  Service-­‐oriented  architecture   •  Workflow     –  Transfer  workflow  technology  to  the  masses   •  Mashup   –  uses  content  from  more  than  one  sources  to  create  a   single  new  service  displayed  in  a  single  graphical   interface   –  E.g.,  Yahoo  pipes   •  Ishisthenthat  style     Amélie  &  Serge,  EDBT,  11111011111     54  
  55. 55. A  hub  for  the  IoT   •  Internet  of  things:  Interconnec-on  of  iden-fiable   compu-ng  devices  within  the  exis-ng  Internet   infrastructure   •  Control  of  connected  objects   •  Explosion  of  things   –  E.g.,  heart  monitoring  implants,  biochip  transponders  on  farm   animals,  automobiles  with  built-­‐in  sensors,  field  opera-on   devices…   •  According  to  Gartner,  there  will  be  nearly  26  billion  devices   on  the  Internet  of  Things  by  2020   •  Many  will  be  personal  devices  that  the  PIMS  should   integrate/control   •  Possibly  a  killer  app  for  the  PIMS   Amélie  &  Serge,  EDBT,  11111011111     55  
  56. 56. Conclusion:  The  PIMS  are  arriving   For  societal,  technical,  industrial  reasons   They  will  change  our  lives  
  57. 57. Society  is  ready  to  move   •  Growing  resentment     –  Against  companies:  intrusive  marke-ng,  cryp-c   personaliza-on  and  business  decisions  (e.g.,  on   pricing),  creepy  "big  data"  inferences   –  Against  governments:  NSA  and  its  European   counterparts   •  Increasing  awareness  of  the  dissymmetry     –  between  what  these  systems  know  about  a  person,   and  what  the  person  actually  knows   •  Emerging  understanding  of  the  value  of  personal   data  for  individuals   57  Amélie  &  Serge,  EDBT,  11111011111    
  58. 58. Society  is  ready  to  move  (2)   •  Privacy  control:  regula-ons  in  Europe   •  Informa-on  symmetry:  Vendor  rela-on   management   •  Many  reports/proposals  that  affirm  the   ownership  of  personal  data  by  the  person   •  Personal  data  disclosure  ini-a-ves     –  Smart  Disclosure  (US);  MiData  (UK),  MesInfos  (France)   –  Several  large  companies  (network  operators,  banks,   retailers,  insurers…)  agreeing  to  share  with  customers   the  personal  data  that  they  have  about  them   Amélie  &  Serge,  EDBT,  11111011111     58  
  59. 59. Technology  is  gearing  up   •  System  administra-on  is  easier   –  Abstrac-on  technologies  for  servers   –   Virtualiza-on  and  configura-on  management  tools   •  Open  source  technology  more  and  more   available  for  services   •  Price  of  machines  is  going  down   –  A  hosted-­‐low  cost  server  is  as  cheap  as  5€/month   –  Paying  is  no  longer  a  barrier  for  a  majority  of  people   You  may  have  friends  already  doing  it   59  Amélie  &  Serge,  EDBT,  11111011111    
  60. 60. Technology  is  gearing  up  (2)   •  Many  systems  &  projects   –  Lifestreams,  Stuff-­‐I’ve-­‐Seen,  Haystack,  MyLifeBits,   Connec-ons,  Seetrieve,  Personal  Dataspaces,  or   deskWeb.     –  YounoHost,  Amahi,  ArkOS,  OwnCloud  or  Cozy  Cloud   •  Some  on  par-cular  aspects   –  Mailpile  for  mail   –  Lima  for  a  Dropbox-­‐like  service,  but  at  home.   –  Personal  NAS  (network-­‐connected  storage)  e.g.   Synologie   –  Personal  data  store  SAMI  of  Samsung...   •  Many  more     60  Amélie  &  Serge,  EDBT,  11111011111    
  61. 61. Industry  is  interested    (1)  Pre-­‐digital  companies   •  E.g.,  hotels  or  banks     •  Disintermediated  from  their  customers  by  pure   Internet  players  such  as  Google,  Amazon,   Booking.com,  Mint.     •  In  PIMS,  they  can  rebuild  direct  interac-on     •  The  playing  field  is  neutral     –  Unlike  on  the  Internet  where  they  have  less  data   •  They  can  offer  new  services  without   compromising  privacy   61  Amélie  &  Serge,  EDBT,  11111011111    
  62. 62. Industry  is  interested    (2)  Home  appliances  companies   •  Many  boxes  deployed  at  home  or  in   datacenters   – Internet  access  and  TV  "boxes”,  NAS  servers,   "smart"  meters  provided  by  energy  vendors,   home  automa-on  systems,  "digital  lockers”…   •  Personal  data  spaces  dedicated  to  specific   usage   •  Could  evolve  to  become  more  generic   •  Control  of  private  Internet  of  objects   62  Amélie  &  Serge,  EDBT,  11111011111    
  63. 63. Industry  is  interested    (3)  Pure  Internet  players   •  Amazon:  great  know-­‐how  in  providing  services   •  Facebook,  Google:  cannot  afford  to  be  out  of  a   movement  in  personal  data  management   •  Very  far  from  their  business  model  based  on   personal  adver-sement   •  Moving  to  this  new  market  would  require  major   changes  &  the  clarifica-on  of  the  rela-onship   with  users  w.r.t.  data  mone-za-on   Amélie  &  Serge,  EDBT,  11111011111     63  
  64. 64. They  will  change  our  lives:     (1)  rebalance  the  Web     •  User  control  over  their  data   –  Who  has  access  to  what,  under  what  rules,  to  do  what     •  User  empowerment   –  They  choose  freely  services  &  they  can  leave  a  service   •  Par-cipa-on  to  a  more  “neutral”  Web   –  With  the  "network  effects",  the  main  plaYorms  are   accumula-ng  data/customers  and  distor-ng  compe--on   –  The  PIMS  bring  back  fairness  on  the  Web   –  Good  practices  are  encouraged,  e.g.,  interoperability,   portability   64  Amélie  &  Serge,  EDBT,  11111011111    
  65. 65. They  will  change  our  lives:     (2)  new  func-onali-es   1.  Data  integra-on   2.  Search  and  queries   3.  Access  control  and  security   4.  Personal  data  analy-cs   5.  Self  administra-on     6.  Synchroniza-on  and  task  sequencing   7.  Control  of  Internet  of  things    …   65  Amélie  &  Serge,  EDBT,  11111011111    
  66. 66. (3)  So  watch  out  for  the  killer  apps   •  Personal  assistant   –  Google  now  enhanced   –  Appointments,  trips,  shopping   –  Tax,  financial,  insurance,  pension…   •  Health  monitoring   –  Quan-fied  self   –  Digital  medical  records   •  Smart  home   •  Elder  care  monitoring  and  advising   Amélie  &  Serge,  EDBT,  11111011111     66  
  67. 67. Come  and  share  PIMS   •  Lots  of  cool  problems   •  Lots  of  opportuni-es  for   your  favorite  data   management  techno     •  Lots  of  super  useful   applica-ons   •  And  some  killer  apps  to   invent   Amélie  &  Serge,  EDBT,  11111011111     67  
  68. 68. References   Data  IntegraJon:   •  A  survey  of  approaches  to  automa7c  schema  matching,  Rahm  &  Bernstein  2001.     •  Principles  of  Data  integra7on,  Doan,  Halevy,  Ives,  2012.   •  Principles  of  dataspace  systems,  Halevy,  Franklin,  and  Maier.  CACM,  2006.     •  Schema  matching  (Rahm  &  Bernstein  2001).     •  Data  integra7on,  Halevy,  Ashish,  Bixon,  et  al.  (2005)   Security  and  trust   •  Management  of  Personal  Informa7on  Disclosure:  The  Interdependence  of  Privacy,  Security,   and  Trust,  Clare-­‐Marie  Karat,  John  Karat,  and  Carolyn  Brodie   •  Secure  Personal  Data  Servers:  a  Vision  Paper.  T  Allard  et  al.  VLDB,  2010.   Knowledge  management   •  Web  Data  Management,  Serge  Abiteboul,  Ioana  Manolescu,  Philippe  Rigaux,  Marie-­‐Chris-ne   Rousset,  Pierre  Senellart,  Cambridge  University  Press,  2011.   •  Ontology  for  PIMS:  OntoPIM,  Ka-fori,  Poggi,  Scannapieco,  et  al.  2005   •  Networked  Environment  for  Personal,  Ontology-­‐based  Management  of  Unified  Knowledge   (NEPOMUK).     Amélie  &  Serge,  EDBT,  11111011111     69  
  69. 69. References   Data  extracJon   •  A  tool  for  personal  data  extrac7on.  D.  Vianna,  A.-­‐M.  Yong,  C.  Xia,  A.   Marian,  and  T.  Nguyen   •  Visual  Web  Informa7on  Extrac7on  with  Lixto,  R.  Baumgartner,  S.  Flesca,G.  Goxlob.   VLDB01   Societal  issues   •  Managing  your  digital  life  with  a  Personal  informa7on  management   system,    Serge  Abiteboul,  Benjamin  André,  Daniel  Kaplan,  Comm.  of  the   ACM,  to  appear   •  hxp://mesinfos.fing.org     •  hxp://www.midatalab.org.uk     •  hxps://www.data.gov/consumer/smart-­‐disclosure-­‐policy     •  hxp://socialsafe.net       Amélie  &  Serge,  EDBT,  11111011111     70  
  70. 70. References   PIMS:   •  As  we  may  think,  Vannevar  Bush,  the  Atlan-c  Monthly,  2005.   •  Personal  Informa7on  Management.  W.  Jones  and  J.  Teevan,  editors.                University  of  Washington  Press,  2007.   •  Beyond  total  capture:  a  construc7ve  cri7que  of  Lifelogging,  Sellen  and  Whitaker,  CACM  2010.   •  A  tool  for  personal  data  extrac7on.  Vianna,  Yong,  Xia,  Marian,    and  Nguyen,  IIWeb  2014.   •  Microsos’s  Stuff  I’ve  Seen  project,  Dumais  et  al.  SIGIR  2003.   •  MyLifeBits,  Gemmel,  Bell  and  Lueder,  CACM  2006.   •  deskWeb,  Zerr  et  al.  SIGIR  2010.   •  Connec7ons,  Soules  and  Ganger,  SOSP  2005.   •  Seetrieve,  Gyllstrom  and  Soules,  IUI  2008.   •  LifeStreams,  Fer-g,  Freeman,  and  Gelernter,  CHI  1996.   •  Haystack,  Karger  et  al.  CIDR  2005.   •  Understanding  What  Works:  Evalua7ng  PIM  Tools,  Diane  Kelly  and  Jaime  Teevan       Amélie  &  Serge,  EDBT,  11111011111     71  

×