Your SlideShare is downloading. ×
  • Like
Hasler2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply
Published

Mem0r1es platform overview (platform to meaningfully share and consolidate digital memories and personal information), Hasler Stiftung, 2014 …

Mem0r1es platform overview (platform to meaningfully share and consolidate digital memories and personal information), Hasler Stiftung, 2014
Prof. Philippe Cudre-Mauroux, exascale infolab, http://exascale.info/

Published in Data & Analytics
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
54
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland Reclaim your Digital Life
  • 2. Motivation (1/3) Commoditization of digital equipment ■ Desktops, laptops, netbooks, mobile phones, tablets, e-book readers, set-top boxes, personal GPSs, digital cameras, TVs, etc. Fragmentation of information across devices
  • 3. Motivation (2/3) The story of my life... ■ Where are the pictures of my niece’s birthday? ■ How should I consolidate/backup my emails? Fortunately there’s the cloud, right?
  • 4. Motivation (3/3) 2014 twist on Personal Information Management: lifelogging, health-monitoring ■ Everylog, Memoto, Google Glasses, Nike's FuelBand, FitBit, Samsung GearFit & competitors... ➡Urgent need to index & integrate continuous personal feeds for automated processing
  • 5. Problem Definition Personal digital information is today fragmented and externalized ➡“Each site is a silo, walled off from the others…” [TBL 10.2010] ■ Data partitioning ■ Loss of governance How shall one automatically reclaim and meaningfully organize his/her digital information dispersed online and on various devices to generate useful digital memories?
  • 6. MEM0R1ES... ...a highly-available, secure, scalable, and semantically- rich platform to extract, preserve, integrate and expose personal information for a smarter world
  • 7. the -Team Prof. Dr. Philippe Cudré-Mauroux Prof. Dr. Karl Aberer Prof. Dr. Maria Sokhn Julien Tscherrig Joël Dumoulin Michele Catasta Dr. Gianluca Demartini Alberto Tonon
  • 8. Last Year… Device & Service Wrappers [EIA-FR] ■ Generic Wrapper Architecture: SMTP, Gmail, Google Drive, Facebook, DBPedia, Flickr, LinkedIn ■ Browser wrapper: [EPFL] Lifelogging rich features (context, user activities and focus, etc.) from the browser Storage Infrastructure ■ Multi-purpose, declarative & elastic storage layer [UNIFR]
  • 9. Result from the Digital Reclaiming ➡Heterogeneous Graphs of Entities Information duplication Sometimes with different facets Missing information
  • 10. Today’s Focus Meaningful information integration from heterogeneous graphs of entities 1. Entity Search (AOR) 2. Entity Typing (TRank) 3. Entity Clustering (ZenCrowd, MemorySense, Predict) 4. Entity Elicitation (Transactive Search) Use-case: leveraging digital mem0r1es from a conference participation (demonstrators)
  • 11. 1. Entity Search [UNIFR] Main idea: combine unstructured and structured search to find relevant entities in the graph ■ Inverted index to locate first candidates ■ Graph queries to refine the results ■ Graph traversals (queries on object properties) ■ Graph neighborhoods (queries on data type properties)
  • 12. 1. Entity Search ➡ up to 25% MAP improvement over BM25!
  • 13. 2. Entity Typing [UNIFR+EPFL] Entities can have many types (facets) ■ Which fine-grained types are most relevant given the context? Thing American Billionaire s People from King County People from Seattle Windows People Agent Person Living People American People of Scottish Descent Harvard University People American Computer Programmers American Philanthropists People from Seattle
  • 14. 2. Entity Typing Integrates BigData types from the Web of data ■ Tree of 447’260 types ■ Rooted on <owl:Thing> ■ Depth of 19 Ranks relevant types by analyzing the context ■ Textual context ■ Graph context ■ Decision trees ■ Linear regression
  • 15. 3. Entity Clustering Several efforts to cluster entities into meaningful groups depending on context: PREDIct [EIA-FR] ■ Extracts Web information through wrappers ■ Models topics through Latent Dirichlet Allocation ■ Predictions based on topic trends
  • 16. 3. Entity Clustering MemorySense [EPFL] ■ Clusters mobile data into macro-activities ■ Leverages location, machine- learning and an activity ontology B-hist [UNIFR+EPFL+EIA-FR] ■ Better browser history clustering through entity typing and machine-learning
  • 17. 4. Entity Elicitation [EPFL+UNIFR] Filling the gaps in mem0r1es entity graphs ■ e.g., ‘who also attended WWW03 last year?’ ■ Traditional methods (Web crawling, machine- learning, micro-task crowdsourcing) are insufficient ■ Errors and lack of discriminative features (➘precision) ■ Lack of public data (➘recall)
  • 18. 4. Entity Elicitation Adapting the concept of transactive memories (group memories) from psychology ➡Transactive search methods to elicit information ■ Social network analysis (to direct the search) ■ Crowdsourcing (to get the information) ■ 46% improvement (F1) over best alternative
  • 19. Demo Use-case on scientific conference memories Based on 4 demonstrators: ■ Visualizing clustered mobile data (MemorySense) ■ Information elicitation through Transactive Search (Hippocampus) ■ Browsing clustered Web history (B-hist) ■ Clustering and prediction of topics based on extracted information (PREDIct)
  • 20. Dissemination (1) Papers at top research venues: ■ Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured search for ad-hoc object retrieval. SIGIR 2012. ■ Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank, Ranking Entity Types Using the Web of Data. International Semantic Web Conference ISWC 2013. ■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudré- Mauroux: Hippocampus, answering memory queries using transactive search. WWW 2014. ■ Michele Catasta, Alberto Tonon, Vincent Pasquier, Gianluca Demartini, Karl Aberer, Philippe Cudré- Mauroux: B-hist, Better Entity-Centric Search over Personal Web Browsing History. International Semantic Web Conference ISWC 2014. ■ Michele Catasta, Alberto Tonon, Gianluca Demartini, Jean-Eudes Ranvier, Karl Aberer, Philippe Cudré- Mauroux: B-hist, Entity-Centric Search over Personal Web Browsing History. Journal of Web Semantics, 2014 (to appear). ■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudre- Mauroux: TransactiveDB: Tapping into Collective Human Memories. PVLDB, 2014 (in revision). ■ Julien Tscherrig, Philippe Cudre-Mauroux, Elena Mugellini, Omar Abou Khaled, Maria Sokhn: SemantiConverter: A Flexible Framework to Convert Semi-Structured Data into RDF. Submitted for publication.
  • 21. Dissemination (2) Android app on Google Play Open-source release of most components ■ https://github.com/MEM0R1ES ISWC 2013 Best-Paper Award nominee (TRank) Semantic Web Challenge 2013 Finalist (B-hist) Wall Street Journal mention (B-hist, 30.10.2013) Technology transfer ■ Extracting entities (Google Zurich) ■ MemorySense (Samsung) ■ TRank (Yahoo!) Start-up (?)
  • 22. Current Research Directions Modelling tail-entities Transactive DB operator Automatic capture of important memories ■ Google Glasses Software integration
  • 23. Conclusions Exciting project ■ Important, timely societal issues ■ Fundamental research questions ■ Data Storage, Data Integration, Data Clustering, Data Elicitation Stimulating collaboration ■ Involving 3 (4) institutions ➡Thanks to all partners for their contributions! A number of tangible results already ■ Open-source software components ■ Publications at top research venues ■ Industry transfer
  • 24. Thanks a lot for your attention, … and many thanks to the Hasler Stiftung for funding this project! Questions? Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland Reclaim your Digital Life