A new direction for recommender
systems: balancing privacy and
personalisation
Dr. Benjamin Heitmann
Unit for Information and Retrieval (UIMR)
INSIGHT Centre for Data Analytics
National University of Ireland, Galway
Overview of talk
• Why is privacy important for personalisation?
– Trade-off between privacy and personalisation
• Basics of recommender systems
– Why do they need so much data?
• Pieces of the puzzle / parts of the solution
– Ideas on open research topics
2
About me
• Post-doctoral researcher, INSIGHT @ NUI Galway
Research interests:
1. Personalisation using knowledge graphs
2. Architecture of data intensive ecosystems
3. Balancing privacy and personalisation
3
Why is privacy important for personalisation?
Personalisation versus privacy
• Personalisation has become an expected feature:
– 75% of consumers prefer personalised E-Commerce retailers
– 94% of companies view personalisation as critical to business
performance
• However, reluctance to share user preferences is growing
5
6
2013/14 – A turning point for privacy online ?
• June 2013: ex-NSA
contractor Edward Snowden
leaks 1,7 million documents
• Reveals systematic
international and domestic
surveillance in:
– USA
– “Five Eyes” (UK, AU, CA)
– Europe
• Targets commercial
companies such as Microsoft,
Apple, Google, Facebook
7Why privacy for the Web ?
Abuse of surveillance for industrial espionage and
LOVEINT
8Why privacy for the Web ?
What has changed ?
9Why privacy for the Web ?
The elephant in the room
– “Surveillance is the
business model of the
internet”
– Bruce Schneier
• Personalisation and privacy
are fundamentally opposed
– Always a trade-off:
utility versus privacy
• Data analytics and privacy
have the same relationship
10Why privacy for the Web ?
Examples from Facebook and Google
11Why privacy for the Web ?
Summary: Why is privacy important for
personalisation?
• “Surveillance” is the business model of the web
• Everybody does it:
– Governments and Businesses
• Why ? Data required for:
– Personalisation
– Analytics
– Advertisements
– More data -> better results
• How could we change that?
– We need to understand how personalisation works
12Why privacy for the Web ?
Recommender systems basics
Parts of a recommender system
14
• Background data: data
about the music, books, ..
– Ratings
– Content data
• User profile: preferences
as ratings or keywords
• Algorithm: uses
background data to
provide recs for one user
profile
Recommender systems basics
RecSys algorithm: Collaborative Filtering
• Most reliable algorithm
• Uses ratings to determine
similarity
• Predict missing ratings
• Requires lots of ratings
• Example:
– 10.000 users
– 1000 items
– 10.000.000 matrix entries
– 7% required: 700.000
ratings
– 70 per user on average
15Recommender systems basics
RecSys algorithm: Content-based filtering
• Similarity determined from
content features:
– Genre
– Keywords
– Description
– Author
• Requires less data from users
• Requires high quality content
description
• Used together with
Collaborative Filtering
16
Garth BrooksJohnny Cash
Iron MaidenMetallica similar
similar
Music
Catch 22
Harry Potter 1
ks
Kyoto
New York
Travel
?
?
?
Recommender systems basics
Collecting user profile data for personalisation
• Architecture of real-world
recommender systems has
changed:
– Shift from closed to open
inventories
– Emergence of ecosystems
to share user preference
data
17Recommender systems basics
Summary: Basics of recommender systems
• Personalisation algorithms require a lot of data
– Collaborative Filtering: uses rating data
– Content-based Filtering: uses content data
• “Magical” results not possible without gigantic amounts of data
• Current approaches collect even more data:
– De-centralised collection approach
• Processing of algorithm is centralised
18Recommender systems basics
Parts of the solution
Can we add privacy to personalisation ?
• Let’s add “privacy” as a requirement.
– For instance, through public
policy.
• What changes ?
– Algorithm
– User experience (UX)
– Business model
20Open research questions: parts of the solution
Part of the solution: Recommendation algorithm
• First step: privacy-enabled
data mining
• Second step: privacy-
enabled personalisation
• Use anonymised data
• De-centralised processing
– Cryptography ?
– Secure multi-party
computation ?
– Oblivious data
structures
21
Aggregation &
Anonymisation
All users of a system
Anonymised data set
Privacy-enabled
personalisation
?
Personlisation
for individual user
Aggregation &
Anonymisation
All users of a system
Insights from anonymised data set
Open research questions: parts of the solution
Part of the solution: User Experience
• PGP vs. SSL
• Both are meant to
keep
communication
secure and private
• Both use similar
cryptographic ideas
• Very different user
experience!
• One is a success,
one is a failure.
22Open research questions: parts of the solution
Part of the solution: Business Model
• Make business model
consumer centric
• Treat users as customers, not
as data sources
• Provide infrastructure for
monetisation of private data
• Revenue from premium
services
23Open research questions: parts of the solution
Summary: Parts of the solution
• Nobody has figured out a recipe for adding privacy to personalised
services.
• If somebody does, it will probably depend on these puzzle pieces:
– Algorithm:
• Use anonymised data, de-centralised processing
– User experience:
• Make it easy to use and understand.
– Business model:
• Who pays? Where does the money come from?
24Open research questions: parts of the solution
Questions & Feedback ?
• If you want to work / contribute / research /
hack / start a business in relation to this new
direction of personalisation, make sure to get
in touch with me!
Benjamin.Heitmann@insight-centre.org
25Recommender systems basics
Additional Slides
Changes in the architecture of recommender systems
27
Main goal of PhD research:
Enable cross-domain personalisation
• Open framework for cross-domain personalisation
• Prototype implementation based on the framework
28
Travel destinations:
Movies:
Multi-source user profiles with
preferences from multiple domains
Cross-domain recommendation
algorithm (SemStim) uses DBpedia
as background knowledge
Recommendations
for target domains
Cross-domain algorithm: SemStim
• Algorithm uses
knowledge
graph from
DBpedia
• Graph search
between two
sets of items
29
Douglas
Adams
User
profile
Recommendable
items
Start of
spreading
activation
DBpedia
Atheism
Activists
Cambridge
United
Kingdom
Macmillian
Restaurant at the
end of the universe
Kurt
Vonnegut
Richard
Dawkins
dc:subject
author
subsequentWork
influencedBy
influencedBy
dc:subject
publisher
author
birthplace
subdivisionName
country
The Hitchhikers
Guide to the
Galaxy (novel)
Part of the solution: Public Policy
• Who provides incentives for
privacy ?
• Public policy bodies, i.e.
European Union
• INSIGHT proposed “Magna
Carta for Data” in the EU
• Can provide mandate for
privacy-enabled
personalisation.
30Open research questions: parts of the solution

A new direction for recommender systems: balancing privacy and personalisation

  • 1.
    A new directionfor recommender systems: balancing privacy and personalisation Dr. Benjamin Heitmann Unit for Information and Retrieval (UIMR) INSIGHT Centre for Data Analytics National University of Ireland, Galway
  • 2.
    Overview of talk •Why is privacy important for personalisation? – Trade-off between privacy and personalisation • Basics of recommender systems – Why do they need so much data? • Pieces of the puzzle / parts of the solution – Ideas on open research topics 2
  • 3.
    About me • Post-doctoralresearcher, INSIGHT @ NUI Galway Research interests: 1. Personalisation using knowledge graphs 2. Architecture of data intensive ecosystems 3. Balancing privacy and personalisation 3
  • 4.
    Why is privacyimportant for personalisation?
  • 5.
    Personalisation versus privacy •Personalisation has become an expected feature: – 75% of consumers prefer personalised E-Commerce retailers – 94% of companies view personalisation as critical to business performance • However, reluctance to share user preferences is growing 5
  • 6.
  • 7.
    2013/14 – Aturning point for privacy online ? • June 2013: ex-NSA contractor Edward Snowden leaks 1,7 million documents • Reveals systematic international and domestic surveillance in: – USA – “Five Eyes” (UK, AU, CA) – Europe • Targets commercial companies such as Microsoft, Apple, Google, Facebook 7Why privacy for the Web ?
  • 8.
    Abuse of surveillancefor industrial espionage and LOVEINT 8Why privacy for the Web ?
  • 9.
    What has changed? 9Why privacy for the Web ?
  • 10.
    The elephant inthe room – “Surveillance is the business model of the internet” – Bruce Schneier • Personalisation and privacy are fundamentally opposed – Always a trade-off: utility versus privacy • Data analytics and privacy have the same relationship 10Why privacy for the Web ?
  • 11.
    Examples from Facebookand Google 11Why privacy for the Web ?
  • 12.
    Summary: Why isprivacy important for personalisation? • “Surveillance” is the business model of the web • Everybody does it: – Governments and Businesses • Why ? Data required for: – Personalisation – Analytics – Advertisements – More data -> better results • How could we change that? – We need to understand how personalisation works 12Why privacy for the Web ?
  • 13.
  • 14.
    Parts of arecommender system 14 • Background data: data about the music, books, .. – Ratings – Content data • User profile: preferences as ratings or keywords • Algorithm: uses background data to provide recs for one user profile Recommender systems basics
  • 15.
    RecSys algorithm: CollaborativeFiltering • Most reliable algorithm • Uses ratings to determine similarity • Predict missing ratings • Requires lots of ratings • Example: – 10.000 users – 1000 items – 10.000.000 matrix entries – 7% required: 700.000 ratings – 70 per user on average 15Recommender systems basics
  • 16.
    RecSys algorithm: Content-basedfiltering • Similarity determined from content features: – Genre – Keywords – Description – Author • Requires less data from users • Requires high quality content description • Used together with Collaborative Filtering 16 Garth BrooksJohnny Cash Iron MaidenMetallica similar similar Music Catch 22 Harry Potter 1 ks Kyoto New York Travel ? ? ? Recommender systems basics
  • 17.
    Collecting user profiledata for personalisation • Architecture of real-world recommender systems has changed: – Shift from closed to open inventories – Emergence of ecosystems to share user preference data 17Recommender systems basics
  • 18.
    Summary: Basics ofrecommender systems • Personalisation algorithms require a lot of data – Collaborative Filtering: uses rating data – Content-based Filtering: uses content data • “Magical” results not possible without gigantic amounts of data • Current approaches collect even more data: – De-centralised collection approach • Processing of algorithm is centralised 18Recommender systems basics
  • 19.
    Parts of thesolution
  • 20.
    Can we addprivacy to personalisation ? • Let’s add “privacy” as a requirement. – For instance, through public policy. • What changes ? – Algorithm – User experience (UX) – Business model 20Open research questions: parts of the solution
  • 21.
    Part of thesolution: Recommendation algorithm • First step: privacy-enabled data mining • Second step: privacy- enabled personalisation • Use anonymised data • De-centralised processing – Cryptography ? – Secure multi-party computation ? – Oblivious data structures 21 Aggregation & Anonymisation All users of a system Anonymised data set Privacy-enabled personalisation ? Personlisation for individual user Aggregation & Anonymisation All users of a system Insights from anonymised data set Open research questions: parts of the solution
  • 22.
    Part of thesolution: User Experience • PGP vs. SSL • Both are meant to keep communication secure and private • Both use similar cryptographic ideas • Very different user experience! • One is a success, one is a failure. 22Open research questions: parts of the solution
  • 23.
    Part of thesolution: Business Model • Make business model consumer centric • Treat users as customers, not as data sources • Provide infrastructure for monetisation of private data • Revenue from premium services 23Open research questions: parts of the solution
  • 24.
    Summary: Parts ofthe solution • Nobody has figured out a recipe for adding privacy to personalised services. • If somebody does, it will probably depend on these puzzle pieces: – Algorithm: • Use anonymised data, de-centralised processing – User experience: • Make it easy to use and understand. – Business model: • Who pays? Where does the money come from? 24Open research questions: parts of the solution
  • 25.
    Questions & Feedback? • If you want to work / contribute / research / hack / start a business in relation to this new direction of personalisation, make sure to get in touch with me! Benjamin.Heitmann@insight-centre.org 25Recommender systems basics
  • 26.
  • 27.
    Changes in thearchitecture of recommender systems 27
  • 28.
    Main goal ofPhD research: Enable cross-domain personalisation • Open framework for cross-domain personalisation • Prototype implementation based on the framework 28 Travel destinations: Movies: Multi-source user profiles with preferences from multiple domains Cross-domain recommendation algorithm (SemStim) uses DBpedia as background knowledge Recommendations for target domains
  • 29.
    Cross-domain algorithm: SemStim •Algorithm uses knowledge graph from DBpedia • Graph search between two sets of items 29 Douglas Adams User profile Recommendable items Start of spreading activation DBpedia Atheism Activists Cambridge United Kingdom Macmillian Restaurant at the end of the universe Kurt Vonnegut Richard Dawkins dc:subject author subsequentWork influencedBy influencedBy dc:subject publisher author birthplace subdivisionName country The Hitchhikers Guide to the Galaxy (novel)
  • 30.
    Part of thesolution: Public Policy • Who provides incentives for privacy ? • Public policy bodies, i.e. European Union • INSIGHT proposed “Magna Carta for Data” in the EU • Can provide mandate for privacy-enabled personalisation. 30Open research questions: parts of the solution