Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web

1,491 views

Published on

Presentation given by Fabrizio Orlandi at I-Semantics 2012, Graz, Austria. More info at http://bit.ly/orlandi and http://i-semantics.tugraz.at/

Published in: Technology, Education

Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web

  1. 1. Digital Enterprise Research Institute www.deri.ie Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web Fabrizio Orlandi, John G. Breslin, Alexandre Passant I-Semantics – Graz, Austria – 5-7 Sept. 2012 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Enabling Networked Knowledge
  2. 2. User Profiling on the Social WebDigital Enterprise Research Institute www.deri.ie Disconnected social websites Isolated data silos http://www.w3.org Enabling Networked Knowledge
  3. 3. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  4. 4. Our SolutionDigital Enterprise Research Institute www.deri.ie Interlink social websites Integration & Merge and model user data User Modelling User Profile Personalise users’ experience using their profile Recommendations Adaptive Systems Search Personalisation Enabling Networked Knowledge
  5. 5. Linking Open DataDigital Enterprise Research Institute www.deri.ie  The Web of Data: a continuously evolving “open corpus” LOD Cloud by R. Cyganiak5 and A. Jentzsch Enabling Networked Knowledge
  6. 6. Representing User Profiles of InterestDigital Enterprise Research Institute www.deri.ie dbp: Semantic_Web foaf:topic_interest wi:topic 0.7foaf: Person wo:weight_value wi:preference wo:weight wi:Weighted_Interest wo:Weight wo:scale opm: wasDerivedFrom 1.0 wo:Scale wo:max_weight sioc:UserAccount 0.0 wo:min_weight Enabling Networked Knowledge 6
  7. 7. Software architectureDigital Enterprise Research Institute www.deri.ie7 Enabling Networked Knowledge
  8. 8. Service-specific Data CollectorDigital Enterprise Research Institute www.deri.ie  Facebook and Twitter sources  OAuth 2.0 user authentication system  PHP libraries: Facebook PHP-SDK, Twitter-async  Data collected from APIs: (up to 1 year back) – User messages, posts, comments – Likes – Check-in – Profile information Enabling Networked Knowledge 8
  9. 9. Data Analyser & Profile GeneratorDigital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge 9
  10. 10. Data Analyser & Profile GeneratorDigital Enterprise Research Institute www.deri.ie  Natural Language Processing tool: Zemanta  Used to spot entities on the collected data and link to DBpedia  List of entities as interests  Named entities (DBpedia URIs), their occurrences and metadata (provenance) are recorded.  Interest Weighting Strategy  Based on frequency and time distance. – Frequency => counting the number of occurrences – Time Distance => using Exponential Time Decay function t/τ x(t) x0 e mean lifetime  RDF representation of interests and weights Enabling Networked Knowledge 10
  11. 11. Profiles AggregatorDigital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge 11
  12. 12. Profiles AggregatorDigital Enterprise Research Institute www.deri.ie  Aggregation of the different platform-specific profiles in one global user profile of interests  Easy aggregation of the interests using RDF  Triples merged in the triplestore  Provenance of the interests preserved  Aggregation of the weights Gi Ws wis Weight of i in s s Global weight interest i Source s Weight of source s Enabling Networked Knowledge 12
  13. 13. DBpedia Resources vs. CategoriesDigital Enterprise Research Institute www.deri.ie  A user profile as a ranked list of DBpedia Resources or Categories Dbpedia Resources weight DBpedia Categories weight The_Clash 0.82 Buzzwords 0.48 Alternative_rock 0.71 Semantic_Web 0.87 Semantic_Web 0.48 Web_Services 0.48 Social_media 0.42 World_Wide_Web 0.39 Linked_Data 0.39 Hypermedia 0.39 … … … … Enabling Networked Knowledge
  14. 14. Categories weighting-schemesDigital Enterprise Research Institute www.deri.ie  1st Strategy (Cat1):  Weights of the Resources/Interests propagated to the related Categories  Cat1 Weight = Sum of the weights of the Category’s Resources  2nd Strategy (Cat2):  Same as 1st Strategy but with discount for “broad” Categories 1 1 Cat Discount log ( SP ) log ( SC ) where: SP = Set of Pages belonging to the Category, SC = Set of Sub-Categories. Enabling Networked Knowledge
  15. 15. ExperimentDigital Enterprise Research Institute www.deri.ie  6 types of user profiles evaluated:  2 types of DBpedia entities – Categories vs. Resources  2 types of weighting-scheme for category-based methods – Cat1: Interests Weight Propagation – Cat2: Interests Weight Propagation w/ Cat. Discount  2 types of exponential Time Decay function – Short mean lifetime 120 days – Long mean lifetime 360 days Enabling Networked Knowledge
  16. 16. ExperimentDigital Enterprise Research Institute www.deri.ie  6 types of user profiles evaluated: Res Cat Cat1 Cat2 Res-120 Res-360 Cat1-120 Cat1-360 Cat2-120 Cat2-360 Enabling Networked Knowledge
  17. 17. User-based EvaluationDigital Enterprise Research Institute www.deri.ie  21 users:  21 to 45 years old – 76% IT students/researchers  Average User Activity: Enabling Networked Knowledge 17
  18. 18. User-based EvaluationDigital Enterprise Research Institute www.deri.ie  We asked users to rate the top 10 interests generated for each of the 6 profiling strategies  Question: “Please rate how relevant is each concept for representing your personal interests and context…”  Rating: 0 (not at all or dont know), 1 (low), 2, 3, 4, 5 (high)  Rating converted to a (0…10) scale  Performance evaluated with:  MRR (Mean Reciprocal Rank)  P@10 (Precision at K = 10)  Comparison with a Baseline  A traditional approach based on “keyword frequency” Enabling Networked Knowledge 18
  19. 19. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  20. 20. Categories vs. ResourcesDigital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  21. 21. Cat1 vs. Cat2 (Cat.Discount)Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  22. 22. t120 vs. t360Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
  23. 23. EvaluationDigital Enterprise Research Institute www.deri.ie  On average for:  200 Tweets  200 Facebook posts, and items. ~106 interests - DBpedia Resources ~720 interests – DBpedia Categories (~7 times)  Statistical significance (t-Test & Wilcoxon’s test) for:  Resources vs. Categories (p<0.05)  Any method vs. Baseline (p<0.05)  Not for time decay (p~0.2) and Cat1 vs. Cat2 Enabling Networked Knowledge
  24. 24. ConclusionsDigital Enterprise Research Institute www.deri.ie  User profiles generated with DBpedia Resources are more accurate than with Categories.  Using Categories generates 7 times more entities than using Resources (and comparable accuracy)  Useful for Recommendation Systems.  Semantics + disambiguation + time decay function outperforms traditional keyword-based methods.  Insight:  Sometimes Resources “too specific” and Categories “too broad”: => Mixed approach to be explored.  TODO: Evaluation in different scenarios (e.g. Recommendations) Enabling Networked Knowledge
  25. 25. ThanksDigital Enterprise Research Institute www.deri.ie Contacts: Fabrizio Orlandi http://bit.ly/orlandi fabrizio.orlandi@deri.org @BadmotorF Enabling Networked Knowledge

×