• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web
 

Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web

on

  • 923 views

Paper presented at the 2013 IEEE/WIC/ACM International

Paper presented at the 2013 IEEE/WIC/ACM International
Conference on Web Intelligence, Atlanta, GA, USA

Statistics

Views

Total Views
923
Views on SlideShare
921
Embed Views
2

Actions

Likes
1
Downloads
3
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web Presentation Transcript

    • INSIGHT Centre for Data Analytics www.insight-centre.org Characterising concepts of interest leveraging Linked Data and the Social Web Fabrizio Orlandi, Pavan Kapanipathi, Amit Sheth, Alexandre Passant IEEE/WIC/ACM Web Intelligence Atlanta, GA, USA 20th November 2013 Copyright 2013 INSIGHT Centre for Data Analytics. All rights reserved. Semantic Web & Linked Data Research Programme
    • Scenario: Personalisation and User Profiling on the Social Web INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme http://www.flickr.com/photos/giladlotan/
    • INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme
    • INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme
    • Solution INSIGHT Centre for Data Analytics www.insight-centre.org Interlink social websites Integration & User Modelling Merge and model user data Personalise users’ experience using their profile User Profile Recommendations Adaptive Systems Search Personalisation [Orlandi et al., I-Semantics 2012] Semantic Web & Linked Data Research Programme
    • Problem INSIGHT Centre for Data Analytics  www.insight-centre.org Entity-based user profiles of interests: Sport CEV Volleyball Cup Music Heavy Metal Mastodon Atlanta … 6 Semantic Web & Linked Data Research Programme
    • Problem INSIGHT Centre for Data Analytics  www.insight-centre.org Entity-based user profiles of interests: Semantics? Pragmatics? Sport CEV Volleyball Cup Music Heavy Metal Mastodon Relevance? Atlanta … 7 Semantic Web & Linked Data Research Programme
    • Linking Open Data INSIGHT Centre for Data Analytics  8 www.insight-centre.org The Semantics of the Web of Data LOD Cloud by R. Cyganiak and A. Jentzsch Semantic Web & Linked Data Research Programme
    • Example INSIGHT Centre for Data Analytics www.insight-centre.org “Mastodon is the best heavy metal band from Atlanta… Can’t wait to see them live again!” “Trentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyball” “W3C Invites Implementations of five Candidate Recommendations for RDF 1.1 #SemanticWeb” Music Heavy Metal Mastodon • Named entity recognition and disambiguation • Frequency + time-decay weighting scheme Atlanta CEV Champions League Volleyball Semantic Web RDF 9 Semantic Web & Linked Data Research Programme
    • Example INSIGHT Centre for Data Analytics  www.insight-centre.org Are all the extracted entities useful for personalisation?  How are concepts/entities being used on the Social Web? (Pragmatics) Music Heavy Metal Mastodon (band) Atlanta (GA.) CEV Champions League Volleyball Very abstract, very popular Very popular Specific and time-dependent on events, etc. Specific, very popular and time-dependent Specific and time-dependent on events, etc. Abstract and popular Semantic Web RDF 10 Abstract and not popular Specific and not popular Semantic Web & Linked Data Research Programme
    • The Dimensions of our Characterisation INSIGHT Centre for Data Analytics  Specificity   www.insight-centre.org The level of abstraction that an entity has in a common conceptual schema shared by humans Popularity  How popular an entity is on the Social Web – How frequently is it mentioned/used at that point of time?  Temporal Dynamics  The trend and evolution of the frequency of mentions of an entity on the Social Web – i.e. popularity over time 11 Semantic Web & Linked Data Research Programme
    • Requirements INSIGHT Centre for Data Analytics  www.insight-centre.org Our use case: real-time personalisation of Social Web streams 1. (quasi-) Real-time computation of the dimensions 2. Results constantly up to date with the real world 3. Knowledge base and domain independent approach 12 Semantic Web & Linked Data Research Programme
    • Popularity INSIGHT Centre for Data Analytics  www.insight-centre.org We chose the Twitter Search API  We search for an entity on the Twitter stream in a short recent time frame.  Run entity disambiguation on the resulting tweets to filter out noisy tweets.  Count the remaining tweets in a given timeframe.  The Popularity measure is the resulting value in tweets/second.  This is fast, simple, up-to-date, only for short recent timeframe. e.g. “Music”~ 16.6 tw/s “Heavy Metal”~ 0.09 tw/s “Semantic Web”~ 0.0008 tw/s 13 Semantic Web & Linked Data Research Programme
    • Temporal Dynamics INSIGHT Centre for Data Analytics  www.insight-centre.org We use Wikipedia page views  Entities are already mapped to DBpedia  MediaWiki API provides a long history of daily page views of Wikipedia articles  We use Mean and Standard Deviation for the last 30 days of page views to identify if the popularity of an entity is: – Stable/Unstable – Trendy/Non-Trendy CEV_Champions_League Typhoon_Haiyan (2013) (Diagrams from: stats.grok.se) Semantic Web & Linked Data Research Programme
    • Specificity INSIGHT Centre for Data Analytics  www.insight-centre.org We use the Linking Open Data (LOD) cloud  Most of the available knowledge bases (e.g. DMOZ, Wordnet, OpenCyc) are not up-to-date.  Wikipedia would be large, domain-independent, continuously updated, but: – entities are not organised hierarchically in a taxonomy – We cannot use taxonomy-based methods (i.e. super/sub -type rel.) – PLUS: expensive algorithms would not be good for real-time computation LOD Links Structure! 15 Semantic Web & Linked Data Research Programme
    • Graph based measures INSIGHT Centre for Data Analytics  www.insight-centre.org SOA graph based method:  indegree and outdegree (here called Incoming/Outgoing Predicates – IP and OP)  We can use these methods with RDF triples  We introduce “distinct in/out-degree” (IDP and ODP ) s1 p1 p1 s2 p2 p3 m o1 p4 o2 Values for “m”: IP (indegree) = 3 OP (outdegree) = 2 IDP (distinct indegree) = 2 ODP (distinct outdegree) = 2 s3 16 Semantic Web & Linked Data Research Programme
    • Our Specificity Measure INSIGHT Centre for Data Analytics  www.insight-centre.org DRR (Distinct Relations Ratio): Incoming Distinct Predicates (IDP) DRR =  Outgoing Distinct Predicates (ODP) Compared with: IP/OP, IP+OP, IP, IDP  Computed on Sindice SPARQL endpoint in less than 1sec. 17 Semantic Web & Linked Data Research Programme
    • Alternative SOA Method INSIGHT Centre for Data Analytics  www.insight-centre.org DMOZ (Open Directory Project) taxonomy   18 We use the hierarchical structure of DMOZ as an alternative method to measure specificity. We manually map entities to the DMOZ entities and compute the distance from the root of the DMOZ tree. Semantic Web & Linked Data Research Programme
    • Generation of a Gold Standard INSIGHT Centre for Data Analytics  www.insight-centre.org Binary classification of entities  5 humans classified 160 entities in: – Generic (38%) – Specific (62%)   Substantial agreement (k=0.61) Ranking of entities  5 humans rated the specificity of 160 entities in: – 1 to 10 scale (1=very generic, 10=very specific) Average Rate 7.03 Average Std. Dev. 1.45 AVG Top 30 High Std. Dev. 5.66 AVG Top 30 Low Std. Dev. 7.51 Abstract entities are harder for humans to rate 19 Semantic Web & Linked Data Research Programme
    • Evaluation: Classification INSIGHT Centre for Data Analytics  www.insight-centre.org We compared the different methods against the gold standard created manually by the users  Agreement with gold std. in the binary classification task: DMOZ IP/OP IP+OP IP random 83.9%  DRR 84.1% 70.0% 70.0% 72.5% 61.9% The performance of the DRR measure for this classification task is comparable to a manual classification done using the DMOZ taxonomy and to human judgement. 20 Semantic Web & Linked Data Research Programme
    • Evaluation: Ranking INSIGHT Centre for Data Analytics  www.insight-centre.org We rank the specificity of 50 randomly chosen entities using:  Gold standard (average of the 5 users’ rates for each entity)  DMOZ levels (integers, 0 to 9) – We compute “DMOZ-” and “DMOZ+” as the worst and best possible rankings compared to the gold standard ranking.   DRR, IP/OP, IP+OP, random, values (real numbers) We compute NDCG (Normalized Discounted Cumulative Gain) at different ranking positions “p”. (DCGideal is the ranking of the gold std.) Semantic Web & Linked Data Research Programme
    • Evaluation: Ranking INSIGHT Centre for Data Analytics www.insight-centre.org DRR: +5% for NDCG at 10 and 20 Semantic Web & Linked Data Research Programme
    • Evaluation on User Profiles INSIGHT Centre for Data Analytics  www.insight-centre.org We evaluate the impact of the proposed measures on user profiles of interests, a real use case   Interests extracted from users’ posts on Facebook and Twitter with NLP tools (as described in our previous work [1])  Frequency-based + time decay weighting strategy  Each user rated his/her Top 30 list of interests generated (total of 794 user ratings)  23 27 volunteers Ratings on a “1 to 5” scale according to how relevant/interesting is each entity of interest to the user (5 is highly relevant) [1] Orlandi et al., I-Semantics 2012 Semantic Web & Linked Data Research Programme
    • Evaluation on User Profiles INSIGHT Centre for Data Analytics  www.insight-centre.org Average score (1 to 5 scale) is computed according to groups of types of entities (+8%) (17%) (+12%)   24 Not-popular and generic entities better represent users’ perception of their interests (but we have only 17% of them) This behaviour might be different in other applications and use cases! (e.g. news recommendations, etc.) Semantic Web & Linked Data Research Programme
    • Conclusions INSIGHT Centre for Data Analytics www.insight-centre.org  Introduced dimensions for characterisation of concepts of interest: specificity, popularity and temporal dynamics.  Proposed methods for their computation satisfying requirements for real-time personalisation of Social Web streams:   Introduced a novel measure (DRR) for specificity of concepts based on the LOD cloud   Evaluated for two different tasks (classification and ranking) against SOA methods (humans, DMOZ, graph measures) Evaluated the impact of the measures on user profiles of interests (27 users and ~800 ratings)  25 Real-time, domain independent, up to date. Abstract and non-popular interests are preferred by users Semantic Web & Linked Data Research Programme
    • Future work INSIGHT Centre for Data Analytics  www.insight-centre.org Experiment the measures on user profiles used for different personalisation tasks.  E.g. a tweets recommender system should give priority to trendy, popular and specific entities instead.  Improve the simple popularity and trend detection methods.  Improve the DRR measure adding more “semantics”, i.e. considering the different types of edges. 26 Semantic Web & Linked Data Research Programme
    • Thanks! INSIGHT Centre for Data Analytics www.insight-centre.org @badmotorf fabrizio.orlandi@deri.org @pavankaps pavan@knoesis.org @amit_p amit@knoesis.org @terraces alex@seevl.net Semantic Web & Linked Data Research Programme