Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cross-domain Personalisation with Semantic Interest Graphs


Published on

The work in this thesis addresses the new challenges and opportunities for online personalisation posed by the emergence of new infrastructures for sharing user preferences and for access to open …

The work in this thesis addresses the new challenges and opportunities for online personalisation posed by the emergence of new infrastructures for sharing user preferences and for access to open repositories of data. As a result of these new infrastructures, user profiles can now include data from multiple sources about preferences in multiple domains. This new kind of user profile data requires a cross-domain personalisation approach. However, current cross-domain personalisation approaches are restricted to proprietary social networking ecosystems.

The main problem that we address in this thesis, is to enable cross-domain recommendations without the use of proprietary and closed infrastructure. Towards this goal, we propose an open framework for cross-domain personalisation. Our framework consists of two parts: a conceptual architecture for recommender systems, and our cross-domain personalisation approach. The main enabling technology for our framework is Linked Open Data, as it provides a common data presentation for user preferences and cross-domain links between concepts from many different domains.

As part of our framework, we first propose a conceptual architecture for Linked Open Data recommender systems that provides guidelines and best practices for the typical high level components required for providing personalisation in open ecosystems using Linked Open Data. The architecture has a strong empirical founding, as it based on an empirical survey of 124 RDF-based applications.

Then we introduce and throughly evaluate SemStim, an unsupervised, graph-based algorithm for cross-domain personalisation. It leverages multi-source, domain-neutral user profiles and the semantic network of DBpedia in order to generate recommendations for different source and target domains. The results of our evaluation show that SemStim is able to provide cross-domain recommendations, without any overlap between target and source domains and without using any ratings in the target domain.

We show how we instantiate our proposed conceptual architecture for a prototype implementation that is the outcome of the ADVANSSE collaboration project with CISCO Galway. The prototype shows how to implement our framework for a real-world use case and data.

Our open framework for cross-domain personalisation provides an alternative to existing proprietary cross-domain personalisation approaches. As such, it opens up the potential for novel and innovative personalised services without the risk of user lock-in and data silos.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Personalisation has become an expected feature,
    but real-world recommender systems have also changed fundamentally, which motivates my PhD research.
  • To introduce my main research problem, I first need to show you a quick comparison of the two different recommender systems architecture.
    Closed inventory is used by e.g. Amazon, in contrast to Facebook which uses an open inventory.
    Open inventory gets preference data from multiple sources.
  • Multi-source user profiles with preference data about multiple domains, which is extremely hard to use for personalisation.
  • EMPHASIS: ADVANSSE shows that it works in the real world, and ties everything together by implementing both parts of the framework
    Advansse research questions:
    Alternative, open ecosystem for cross-domain recommendations?
    Data structure for domain-neutral user profiles?
  • data discovery service: aggregate distributed profiles
    data homogenisation service: Integrate user profiles
    RDF store and graph access layer: Store multi-source & domain-neutral profiles
    Personalisation component runs cross-domain algorithm
    User interface: shows Recommendations to user
  • Now explain what happens in the personalisation component (uses RDF store and user interface).
    Emphasis: Graph algorithm on semantic network.
    Idea: Start with one domain/set of nodes, end in another.
    Graph search: Find a path between the two domains
    SA has been described by Crestani, we extend it.

  • 5th goal: SemStim can use cross-domain recs. to mitigate cold-start problem for new users
  • The Cremonesi experiment protocol is very demanding, as the two classes are very unbalanced.
  • We had to come up with our own diversity metric, which is based on estimating the number of clusters.
  • Main features of challenge:
    1.) competition between teams
    2.) real-time result submission
    3.) secret ground truth for the test data
  • Main features of challenge:
    1.) competition between teams
    2.) real-time result submission
    3.) secret ground truth for the test data
  • Target: 20 min
  • Based
  • XMPP Pub/Sub protocol enables distribution
    SPARQL Update used for data synchronisation
    Instantiates conceptual architecture
    Shows how to support an open ecosystem for personalisation
  • SemStim performs differently for different pairings of domains
    SemStim currently uses uniform weights for all edges
    Currently naïve baselines are used, which shows that the algorithm is quite robust, more sophisticated approaches could improve results
  • 3 StackExchange Sites: security, web apps, bicycles
  • Transcript

    • 1. An Open Framework for Multi-source, Cross-domain Personalisation with Semantic Interest Graphs Benjamin Heitmann Ph.D. Viva Monday, 28 July 2014
    • 2.  Personalisation has become an expected feature:  75% of consumers prefer personalised E-Commerce retailers  94% of companies view personalisation as critical to business performance  Examples: Amazon,, Facebook Personalisation has become a commodity 2 Motivation
    • 3. Architecture of recommender systems: closed versus open inventory 3 Main research problem:  How to enable cross-domain personalisation without proprietary & closed infrastructure and algorithm ? Motivation
    • 4. State-of-the-art limitations: Collaborative Filtering Research problems:  Provide cross-domain recommendations without overlap ?  Cold-start problem ? 4 Motivation
    • 5. Definitions  Definition of a domain: Any set of recommendable items + set of users + preferences between users and items  Source domain: A domain with non-empty preferences  Target domain: A domain with recommendable items  Cross-domain personalisation task:  Using preferences in source domain to provide recommendations in different target domain  No overlap between source and target domain 5 Motivation
    • 6. State-of-the-art limitations: Content-based filtering 6 General requirement: Data with links between different domains? Research question: Cross-domain recommendations without ratings in target domain? content based single domain.pdf Garth BrooksJohnny Cash Iron MaidenMetallica similar similar Music Catch 22 Harry Potter 1 Books Kyoto New York Travel Garth BrooksJohnny Cash Iron MaidenMetallica similar similar Music Catch 22 Harry Potter 1 Books Kyoto New York Travel ? ? ? Motivation
    • 7. Enabling technology for cross-domain personalisation: Linked Open Data (LOD) 7 LOD can enable cross- domain personalisation: 1. Provides re-usable concept identifiers 2. Cross-domain links for many different domains 3. Standard for interoperable graph data Research question: Best practices for LOD recommender systems ? As of September 2011 Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet YAGO World Fact- book El Viajero Tourism WordNet (W3C) WordNet (VUA) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UniRef UniProt UMBEL UK Post- codes legislation Uberblic UB Mann- heim TWC LOGD Twarql transport uk Traffic Scotland theses. fr Thesau- rus W Tele- graphis TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics uk St. Andrews Resource Lists ECS South- ampton EPrints SSW Thesaur us Smart Link Slideshare 2RDF semantic Semantic Tweet Semantic XBRL SW Dog Food Source Code Ecosystem Linked Data US SEC (rdfabout) Sears Scotland Geo- graphy Scotland Pupils & Exams Scholaro- meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF New- castle LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course- ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research ukRen. Energy Genera- tors reference uk Recht- spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké- pédia patents data.go Ox Points Ord- nance Survey Openly Local Open Library Open Cyc Open Corpo- rates Open Calais OpenEI Open Election Data Project Open Data Thesau- rus Ontos News Portal OGOLOD Janus AMP Ocean Drilling Codices New York Times NVD ntnusc NTU Resource Lists Norwe- gian MeSH NDL subjects ndlna my Experi- ment Italian Museums medu- cator MARC Codes List Man- chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi- sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT Linked User Feedback LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch- base lingvoj Lichfield Spen- ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp- stuhl- club Good- win Family National Radio- activity JP Jamendo (DBtune) Italian public schools ISTAT Immi- gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo- dations GovTrack GovWILD Google Art wrapper gnoss GESIS GeoWord Net Geo Species Geo Names Geo Linked Data GEMET GTAA STITCH SIDER Project Guten- berg Medi Care Euro- stat (FUB) EURES Drug Bank Disea- some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici- palities ChEMBL FanHubz Event Media EUTC Produc- tions Eurostat Europeana EUNIS EU Insti- tutions ESD stan- dards EARTh Enipedia Popula- tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio ECS South- ampton ECCO- TCP GND Didactal ia DDC Deutsche Bio- graphie data dcs Music Brainz (DBTune) Magna- tune John Peel (DBTune) Classical (DB Tune) Audio Scrobbler (DBTune) Last.FM artists (DBTune) DB Tropes Portu- guese DBpedia dbpedia lite Greek DBpedia DBpedia data- open- ac-uk SMC Journals Pokedex Airports NASA (Data Incu- bator) Music Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs (Data Incubator) Climbing intervals Data data Cornetto reegle Chronic- ling America Chem2 Bio2RDF Calames business uk Bricklink Brazilian Poli- ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome PubMed Pub Chem PRO- SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com- pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy- metrix bible ontology BibBase FTS BBC Wildlife Finder BBC Program mes BBC Music Alpine Ski Austria LOCAH Amster- dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences User-generated content Motivation
    • 8. Overview of approach  Open framework for cross-domain personalisation 1. Conceptual architecture for recommender systems using Linked Open Data 2. Cross-domain personalisation approach using RDF and Linked Data  Prototype implementation based on the framework 8 Travel destinations: Movies: Multi-source user profiles with preferences from multiple domains Cross-domain recommendation algorithm (SemStim) uses DBpedia as background knowledge Recommendations for target domains
    • 9. Conceptual Architecture for LOD recommender systems: Methodology  Goal:  Identify best practices  List most common components  Enable recommender systems to use Linked Data  Methodology with strong empirical grounding: 1. Survey of 124 RDF-based applications (2003 to 2009) • 15 questions • Original authors were contacted to verify or correct our assessment 2. Architectural analysis to identify common components 3. Extend proposed architecture for recommender systems 9 Conceptual architecture for LOD recommender systems
    • 10. Conceptual Architecture for LOD recommender systems 10 Conceptual architecture for LOD recommender systems
    • 11. Cross-domain algorithm: SemStim  Requirements:  Graph algorithm  Graph search between two domains  SemStim extends Spreading Activation:  Adds targeted activation  Adds constraints for algorithm duration 11 Douglas Adams User profile Recommendable items Start of spreading activation DBpedia Atheism Activists Cambridge United Kingdom Macmillian Restaurant at the end of the universe Kurt Vonnegut Richard Dawkins dc:subject author subsequentWork influencedBy influencedBy dc:subject publisher author birthplace subdivisionName country The Hitchhikers Guide to the Galaxy (novel) SemStim evaluation
    • 12. Evaluation: Objectives 1. Can SemStim provide single-domain recommendations? 2. Can SemStim provide cross-domain recommendations? 3. How diversity are the SemStim recommendations? 4. Is there a connection between accuracy and diversity ? 12 SemStim evaluation
    • 13. Evaluation: comparison algorithms  Algorithms for comparison:  k-nn Collaborative Filtering  SVD++ Collaborative Filtering  Random selection  Linked Data Semantic Distance (LDSD)  Set-based breadth first search (SetBFS)  Background knowledge: DBpedia 3.8 (67m edges, 11m vertices) 13 SemStim evaluation
    • 14. Single-domain accuracy experiment protocol  Data set: MovieLens 100k  Metrics: precision, recall, F1-score  Experiment protocol:  Adapted from Cremonesi  Top-k recommendation task  90%/10% train/probe split  Test profile: highly rated items in probe set plus random items 14 SemStim evaluation
    • 15. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.05 0.10 0.15 0 5 10 15 20 number of recommendations F1−score ● CFknn LDSD Random SemStim SetBFS SVD++ Single-domain accuracy experiment: results 15 SemStim SemStim evaluation
    • 16. Cross-domain accuracy experiment protocol 16  Data set: Amazon SNAP  Ratings from users with at least 20 ratings in two domains  Metrics: precision, recall, F1-score  Experiment protocol:  Source domain provides train profile  Target domain provides test profile  CF algorithms unsuitable to high sparsity SemStim evaluation
    • 17. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 0.005 0.010 0.015 0 10 20 30 number of recommendations F1−score ● LDSD Random SemStim SetBFS Cross-domain accuracy experiment 17 SemStim DVDs >> Music SemStim evaluation
    • 18. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.01 0.02 0 10 20 30 number of recommendations F1−score ● LDSD Random SemStim SetBFS Cross-domain accuracy experiment 18 SemStim Music >> DVDs SemStim evaluation
    • 19. 0.00 0.25 0.50 0.75 1.00 C Fknn50 SVD ++ R andom LD SD SetBFS 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 algorithm name, or activation threshold for SemStim diversity Single-domain diversity experiment 19  Data set: MovieLens 100k  Experiment protocol:  95%/5% train/test split  Results:  Diversity can be tuned  Requires using all preferences (incl. negative) Less diverse More diverse Increasing activation threshold SemStim evaluation
    • 20. LOD-RecSys challenge at ESWC 2014: Diversity recommendation task 20  Data set: DBbook  Metrics:  F1-score @20 & Inter-List Diversity @20  Ranking based on average rank for both metrics  Diversity rec. task:  Recommend top-20 of all unrated items for each user  Implementation challenge:  Real-time result submission  Hidden ground-truth SemStim evaluation
    • 21. ● ● ● ● ● ● ● ● 0.03 0.04 0.05 0.06 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 activation threshold F1−score@20 ● ● ● ● ● ● ● ● 0.465 0.470 0.475 0.480 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 activation threshold InterListDiversity@20 LOD-RecSys challenge at ESWC 2014: Diversity recommendation task 21  Results:  3rd place out of 12 teams  Competitive performance  SemStim unbiased  Can balance accuracy and diversity Best rank Best rank SemStim evaluation
    • 22. ADVANSSE prototype  Outcome of collaboration project with CISCO Galway  Goals:  Show relevance to real-world, industry use case  Implement cross-domain personalisation framework  Instantiate conceptual architecture for LOD recommender systems  Provide distributed and open ecosystem for cross-domain personalisation 22 ADVANSSE prototype
    • 23. ADVANSSE use case Functional requirements: 1. Filtering of subscriptions 2. Recommendation of posts 3. Updating of interests and recommendations 23 ? MARKETING DEVELOPMENT R & D ADVANSSE prototype
    • 24. ADVANSSE distributed social platform 24 Bob Cecilia ADVANSSE server RDF store XMPP server Personalisation component ADVANSSE connected social platform (1) XMPP client Application logic XMPP Andrew Data homogenisation service Graph query language service RDF store Structured data authoring interface User interface ADVANSSE connected social platform (2) XMPP client Application logic XMPP RDF store Structured data authoring interface User interface ADVANSSE prototype
    • 25. ADVANSSE prototype: user interface 25 ADVANSSE prototype
    • 26. Summary of contributions  Conceptual architecture:  Describes best practices for leveraging LOD for recommender systems  List of high-level components  Strong empirical grounding  Cross-domain recommendation approach using SemStim  Can provide single-domain and cross-domain recommendations  No overlap between source & target domain required  No ratings in target domain required  Competitive performance  Diversity of recommendations can be tuned  ADVANSSE prototype:  Based on real-world use case  Shows how to use LOD to enable an ecosystem for cross-domain pers. 26 Conclusion
    • 27. Future work  Investigate connection between performance of SemStim and choice of target and source domains  Learning of weights for different edge types  Improving the quality of linkage data 27 Conclusion
    • 28. Dissemination  In top-3 for Diversity task at the LOD-RecSys challenge, ESWC 2014  Publications:  2 book chapters  1 journal paper  2 conference papers  2 workshop papers  1 conference poster  ADVANSSE web site: 28 Conclusion
    • 29. Extra graphs and data / details 29
    • 30. Motivation: New requirements for recommender systems  Architecture of real-world RecSys has changed:  Shift from closed to open inventories  Emergence of ecosystems to share user preference data  New requirements for recommender systems:  Multi-source profiles  Domain-neutral preferences  Cross-domain personalisation  Existing infrastructure and algorithms are proprietary and closed 30
    • 31. Diversity test for Amazon SNAP DVD data 31 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.1 0.2 0.3 0 5 10 15 20 value of topK F1−score ● CFknn50 LDSD Random SemStim02 SemStim03 SemStim04 SetBFS SVDpp
    • 32. Examples of cross-domain recommendations 32
    • 33. advansse:Question1 sioc:Post advansse:Questiondc:title dc:description rdf:type rdf:type Title Post body advansse:User1 dc:creator sioc:UserAccountrdf:type Display Name sioc:name advansse:Tag1 ert:hasTopic ctag:Tagrdf:type Tag String ctag:label resource/Entity ctag:means ert:interestedIn advansse:Answer1 sioc:Post advansse:Answer rdf:type rdf:type dc:description Answer body advansse:hasAnswer Namespaces: sioc - ert - advansse - rdf - dc - ctag - advansse:hasQuestion ADVANSSE prototype: Implementing domain-neutral user profiles  Domain-neutral user profiles implemented using CISCO ERT schema  Content extracted from 3 sites on StackExchange  Integrated with DBpedia background knowledge  Data storage:  Jena TDB  HDT triple store 33