Profiling User Interests on the Social Semantic Web
Upcoming SlideShare
Loading in...5
×
 

Profiling User Interests on the Social Semantic Web

on

  • 922 views

Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014.

Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014.
Supervisors: Alexandre Passant and John G. Breslin.
Examiners: Fabien Gandon and Stefan Decker

Statistics

Views

Total Views
922
Views on SlideShare
855
Embed Views
67

Actions

Likes
1
Downloads
10
Comments
1

3 Embeds 67

https://twitter.com 41
http://www.scoop.it 25
https://tweetdeck.twitter.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Profiling User Interests on the Social Semantic Web Profiling User Interests on the Social Semantic Web Presentation Transcript

  • Profiling User Interests on the Social Semantic Web Ph.D. Viva Fabrizio Orlandi
  • Context: Personalisation 2
  • 3 Problem
  • 4 Goal
  • 1 – Heterogeneous data sources Sport CEV Volleyball Cup Music Heavy Metal Mastodon Atlanta … Microblog? Challenges 5 / 37 Social Networking Service?
  • 2 – Lack of provenance Sport CEV Volleyball Cup Music Heavy Metal Mastodon Atlanta … Where?Who? How? Challenges 6 / 37 What?
  • 3 – Semantics of entities of interest Sport CEV Volleyball Cup Music Heavy Metal Mastodon Atlanta … Semantics? Pragmatics? Relevance? Challenges 7 / 37
  • Research Questions 1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests? 2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling? 3. Semantic enrichment of user profiles and personalisation: How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks? 8 / 37
  • Research Goal How can we collect, represent, aggregate, mine, enrich and deploy user profiles of interests on the Social Web for multi-source personalisation? 9 / 37
  • Methodology 10 / 37
  • 1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests? 11 / 37
  • Aggregation of Social Web Data  Modelling solution for Social Web data and user profiles  Based on SIOC, FOAF and extensions  Experiments on wikis [Orlandi, Passant. WikiSym. ACM. 2010.] 12 / 37
  • Music Heavy Metal Mastodon Atlanta CEV Champions League Volleyball Semantic Web RDF “Mastodon is the best heavy metal band from Atlanta… Can’t wait to see them live again!” “Trentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyball” User likes RDF and SemanticWeb on Facebook • Natural language processing tools for entity extraction (Zemanta & Spotlight) • Frequency + time-decay weighting schemes Example 13 / 37
  • Aggregation and Mining of Interests 14 7 types of user profiling strategies: 2 types of DBpedia entities: Categories vs. Resources 2 types of weighting-scheme for category-based methods - Cat1: Interests Weight Propagation - Cat2: Interests Weight Propagation w/ Cat. Discount 2 types of exponential Time Decay function - Short mean lifetime - Long mean lifetime 1 “bag-of-words” (Tag-based) state-of-the-art approach days120 days360
  •  Evaluation  User study: 21 users rating their user profiles from Twitter & Facebook  210 ratings for each of the 7 different profiling methods Aggregation and Mining of Interests 0 0.2 0.4 0.6 0.8 1 P@10 AVG Score  Key findings  DBpedia resource-based profiles outperform Dbpedia category-based and tag-based profiles.  Best strategy: Resources + Frequency & Slow Time Decay weighting scheme [Orlandi, Breslin, Passant. I-Semantics. ACM. 2012.] 15 / 37
  • 1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests? 2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling? 16 / 37
  • Motivation: use of provenance information as core of the profiling heuristics to improve mining of user interests and semantic enrichment  Data Provenance as the history, the origins and the evolution of data Who created/modified it? When? What is the content? Where is it located? How and Why was it created? Which tools and processes were used? Provenance of Data  Provenance as the “bridge” between Social Web and Web of Data e.g. Wikipedia/DBpedia 17 / 37
  • Use Case: Provenance on Wikis Provenance on the Social Web for the Web of Data  A semantic model to represent provenance information in wikis  A software architecture to extract provenance from Wikipedia  An application that uses and exposes provenance data to compute measures and statistics on Wikipedia articles [Orlandi, Champin, Passant. SWPM at ISWC. 2010.] 18 / 37
  • Provenance on the Social Web 19 / 37
  •  Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.  Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.  We built a model and an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia. Provenance on the Web of Data for the Social Web Use Case: Provenance on DBpedia [Orlandi, Passant. Journal of Web Semantics. 2011] 20 / 37
  • Semantic provenance in DBpedia • Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources. • Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource. • We built an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia. 21 / 37
  • Provenance for Profiling Interests Different provenance features to support interest mining  Not only: authorship and temporal features  But also: social media source, object, type of action, … 22 / 37
  • Provenance for Profiling Interests User study: 27 users on Twitter and Facebook They evaluated their aggregated and provenance-aware user profiles Social Feature Score E FB education 4.62 E FB workplace 4.60 I TW followees’ posts 4.03 I FB checkins 3.95 E FB interests 3.95 E FB likes 3.92 I TW favourite posts 3.76 I TW retweets 3.76 I TW posts 3.61 I TW replies 3.52 I FB status updates 3.50 I FB media actions 3.24 I FB comments 2.56 I FB direct posts 2.37  AVG Scores from 1 to 5  Locations, explicit profile info and also followees’ posts provide better accuracy for mining user interests  Interests stated explicitly by users produce user profiles 20% more accurate than implicitly 1 3 5 [Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM WI. 2013] 23 / 37
  • 2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling? 3. Semantic enrichment of user profiles and personalisation: How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks? 24 / 37
  • Semantic Enrichment db:Montreal db:Quebec db:Gilles_Villeneuve db:Ferrari db:Formula_1 dbo:wikiPageWikiLink dbo:wikiPageWikiLink dbo:birthPlace dbp:largestcity 25 / 37
  • Music Heavy Metal Mastodon (band) CEV Champions League Volleyball Semantic Web RDF Example Are all the extracted entities useful for personalisation? How are concepts/entities being used on the Social Web? (Pragmatics) Very abstract, very popular Specific and time-dependent on events, etc. Specific and time-dependent on events, etc. Abstract and not popular Abstract and popular Specific and not popular Very popular 26 / 37
  • Characterising Concepts of Interest 27 Novel measures for the characterisation and semantic expansion of concepts of interest  Enrichment of entity-based user profiles for personalisation  Popularity of concepts on the Social Web (using Twitter)  How popular an entity is on the Social Web? How frequently is it mentioned/used at that point of time?  Trend and temporal dynamics (using Wikipedia page views)  The trend and evolution of the frequency of mentions of an entity on the Social Web (i.e. popularity over time)  Specificity and categorisation of entities of interest (using LOD)  The level of abstraction that an entity has in a common conceptual schema shared by humans 27 / 37
  • Requirements Use case: real-time personalisation of Social Web streams 1. Real-time computation of the dimensions 2. Results constantly up to date with the real world 3. Knowledge base and domain independent approach 28 / 37
  • Popularity? [Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM, WI 2013] Characterising Concepts of Interest Trendy and Stable? Specificity? 29 / 37
  • Real-time Semantic Personalisation of Social Web Streams “SPOTS”: A methodology for real-time personalisation of any large social stream  Automatic dynamic generation of multi-source user profiles of interests.  Semantic enrichment of concepts of interest with provenance and Linked Data info.  Ranking and selection of the interests according to their relevance for the user and for the personalisation use case.  Informativeness measures for posts to filter a large social stream.  Evaluation of the approach on the public Twitter stream  Against Twitter #Discover: from 192% increase in accuracy 30 / 37
  • [Kapanipathi, Orlandi, Sheth, Passant. SPIM at ISWC 2011.] 31 Real-time Semantic Personalisation of Social Web Streams 31
  • Evaluation on SPOTS User study to evaluate the impact of the enrichment on a personalisation use case 27 users, 800 user ratings collected Main outcome:  Popularity and Temporal Dynamics are useful measures for real-time personalisation SPOTS Improvement* No Enrichment --- Trendy +29% Not Stable +26% At Least 2 Features +9% Specific + Not Popular +5% * In recommendations accuracy over non-enriched profiles 32 / 37
  • Evaluation on User Profiles User study to evaluate the impact of the enrichment on user profiles according to users’ judgement 27 users, 800 user ratings collected Main outcome:  Specificity is more useful than popularity measures according to user perception User Profiles Improvement* No Enrichment --- Not Specific + Not Popular +13% Not Specific +8% Not Popular +2% Stable + Not Trendy +1% * In profile accuracy over non-enriched profiles 33 / 37
  • Summary 34 / 37[Orlandi, UMAP 2012]
  • Summary  We provide and evaluate a complete methodology for profiling user interests across multiple sources on the Social Web  Collect, Represent, Aggregate, Mine, Enrich, Deploy  Aggregation of user data: • Semantic representation of Social Web content and user activities  Provenance of data: • Improves profiling accuracy and connects Social Web and WoD  Mining of user interests: • Provenance + Linked Data/Entity-based strategies + time decay, outperform traditional “bag-of-words” strategies and facilitate enrichment  Semantic enrichment: • Improves profiling accuracy and it is necessary for the deployment of the profiles in a personalisation use case • Different types of personalisation need different entities of interest 35 / 37
  • Future Work Federated Personal Data Manager  Privacy-aware, interoperable, autonomous, user profiling infrastructure Provenance at Web Scale  Necessary to focus on techniques for an easier and less expensive tracking and management of provenance on the Social Semantic Web Adaptive Profiling of User Interests  Adaptation of the profiling algorithm and strategy according to the application and the context 36 / 37
  • Contributions & Dissemination  Semantic Web modelling solutions for Social Web data, user profiles, provenance on the Social Web and Web of Data.  A provenance computation framework  Novel measures for characterising entities of interest  A real-time personalisation system for large Social Web streams  User studies for different profiling strategies, provenance features and personalisation use-cases  A privacy-aware user profile management system Publications  2 journal, 4 conference, 2 workshop papers 37 / 37 Thanks!
  • 38
  • Context 39 User Modelling • The process of representing a user or some of his/her characteristics (e.g. interests, workplace, location, etc.) User Profile • A characterisation of a user at a particular point of time
  • Experiment 6 types of user profiles evaluated: 2 types of DBpedia entities Categories vs. Resources 2 types of weighting-scheme for category-based methods Cat1: Interests Weight Propagation Cat2: Interests Weight Propagation w/ Cat. Discount 2 types of exponential Time Decay function Short mean lifetime Long mean lifetime days120 days360
  • Experiment 6 types of user profiles evaluated: Cat2 Cat1-120 Cat1-360 Cat2-120 Cat2-360Res-120 Res-360 Res Cat Cat1
  • 42 User-based Evaluation  We asked users to rate the top 10 interests generated for each of the 6 profiling strategies  Question: “Please rate how relevant is each concept for representing your personal interests and context…”  Rating: 0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)  Rating converted to a (0…10) scale  Performance evaluated with: MRR (Mean Reciprocal Rank) P@10 (Precision at K = 10)  Comparison with a Baseline A traditional approach based on “keyword frequency”
  • Evaluation On average for: 200 Tweets & 200 Facebook posts, and items. ~106 interests – DBpedia Resources ~720 interests – DBpedia Categories (~7 times) Statistical significance for: Resources vs. Categories (p<0.05) Any method vs. Baseline (p<0.05) Not for time decay (p~0.2) and Cat1 vs. Cat2