SlideShare a Scribd company logo
1 of 60
Privacy-preserving Analytics
and Data Mining at LinkedIn
Practical Challenges & Lessons Learned
Krishnaram Kenthapadi
AI @ LinkedIn
Differential Privacy Deployed Workshop, September 2018
Key Takeaways
• Framework to compute robust, privacy-preserving analytics
• Addressing challenges such as preserving member privacy, product coverage,
utility, and data consistency
• Scalable system/tools for application across analytics applications
• Design decisions / trade-offs based on inputs from product / legal teams
• Privacy challenges/design for a large crowdsourced system (LinkedIn
Salary)
Analytics & Reporting Products at LinkedIn
Profile View Analytics
3
Content Analytics
Ad Campaign Analytics
All showing
demographics of
members engaging with
the product
• Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
• Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
E.g., Clicks on a
given adE.g., Title = “Senior
Director”
Analytics & Reporting Products at LinkedIn
Privacy Requirements
• Attacker cannot infer whether a member performed an action
• E.g., click on an article or an ad
• Attacker may use auxiliary knowledge
• E.g., knowledge of attributes associated with the target member (e.g.,
obtained from this member’s LinkedIn profile)
• E.g., knowledge of all other members that performed similar action (e.g., by
creating fake accounts)
• Examples of potential attacks discussed in ACM CIKM’18 paper
Possible Privacy Attacks
7
Targeting:
Senior directors in US, who studied at Cornell
Matches ~16k LinkedIn members
→ over minimum targeting threshold
Demographic breakdown:
Company = X
May match exactly one person
→ can determine whether the person
clicks on the ad or not
Require minimum reporting threshold
Attacker could create fake profiles!
E.g. if threshold is 10, create 9 fake profiles
that all click.
Rounding mechanism
E.g., report incremental of 10
Still amenable to attacks
E.g. using incremental counts over time to
infer individuals’ actions
Need rigorous techniques to preserve member privacy, not
revealing exact aggregate counts
Key Product Desiderata
• Coverage & Utility
• Data Consistency
• for repeated queries
• over time
• between total and breakdowns
• across entity hierarchy
• across action hierarchy
• for top k queries
Problem Statement
Compute robust, reliable analytics in a privacy-
preserving manner, while addressing the product
desiderata such as coverage, utility, and consistency.
Our Approach
• Privacy model: Event-level differential privacy
• Pseudo-random noise generation
• Post-processing to achieve consistency
• Incremental: over time, the released counts are non-decreasing
• Consistency at the account/campaign group/campaign level
• Consistency for top k queries
K. Kenthapadi, T. T. L. Tran,
PriPeARL: A Framework for
Privacy-Preserving Analytics
and Reporting at LinkedIn,
ACM CIKM 2018.
PriPeARL: A Framework for Privacy-Preserving Analytics
11
Pseudo-random noise generation, inspired by differential privacy
● Entity id (e.g., ad
creative/campaign/account)
● Demographic dimension
● Stat type (impressions, clicks)
● Time range
● Fixed secret seed
Uniformly Random
Fraction
● Cryptographic
hash
● Normalize to
(0,1)
Random
Noise
Laplace
Noise
● Fixed ε
True
Count
Noisy
Count
To satisfy consistency
requirements
● Pseudo-random noise → same query has same result over time, avoid
averaging attack.
● For non-canonical queries (e.g., time ranges, aggregate multiple entities)
○ Use the hierarchy and partition into canonical queries
○ Compute noise for each canonical queries and sum up the noisy
counts
System Architecture
Performance Evaluation: Setup
13
● Experiments using LinkedIn ad analytics data
○ Consider distribution of impression and click
queries across (account, ad campaign) and
demographic breakdowns.
● Examine
○ Tradeoff between privacy and utility
○ Effect of varying minimum threshold (non-
negative)
○ Top-n queries
Performance Evaluation: Results
14
Privacy and Utility Tradeoff
For ε = 1, average absolute and signed
errors are small for both queries (impression and
click)
Variance is also small, ~95% of queries have error
of at most 2.
Top-N Queries
Common use case in LinkedIn application.
Jaccard distance as a function of ε and n.
This shows the worst case since queries with return
sets ≤ n and error of 0 were omitted.
Lessons Learned from Deployment (> 1 year)
• Consistency vs. utility trade-off
• Semantic consistency vs. unbiased, unrounded noise
• Suppression of small counts
• Online computation and performance requirements
• Scaling across analytics applications
• Tools for ease of adoption (code/API library, hands-on how-to tutorial) help!
Summary
• Framework to compute robust, privacy-preserving analytics
• Addressing challenges such as preserving member privacy, product coverage,
utility, and data consistency
• Future
• Utility maximization problem given constraints on the ‘privacy loss budget’
per user
• E.g., noise with larger variance to impressions but less noise to clicks (or conversions)
• E.g., more noise to broader time range sub-queries and less noise to granular time range
sub-queries
• Reference: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy-Preserving
Analytics and Reporting at LinkedIn, ACM CIKM 2018.
LinkedIn Salary
LinkedIn Salary (launched in Nov, 2016)
Salary Collection Flow via Email Targeting
Current Reach (September 2018)
• A few million responses out of several millions of members targeted
• Targeted via emails since early 2016
• Countries: US, CA, UK, DE, IN, …
• Insights available for a large fraction of US monthly active users
Data Privacy Challenges
• Minimize the risk of inferring any one individual’s compensation data
• Protection against data breach
• No single point of failure
Achieved by a combination of
techniques: encryption, access control,
, aggregation,
thresholding
K. Kenthapadi, A. Chudhary, and S.
Ambler, LinkedIn Salary: A System
for Secure Collection and
Presentation of Structured
Compensation Insights to Job
Seekers, IEEE PAC 2017
(arxiv.org/abs/1705.06976)
Modeling Challenges
• Evaluation
• Modeling on de-identified data
• Robustness and stability
• Outlier detection
X. Chen, Y. Liu, L. Zhang, and K.
Kenthapadi, How LinkedIn
Economic Graph Bonds
Information and Product:
Applications in LinkedIn Salary,
KDD 2018
(arxiv.org/abs/1806.09063)
K. Kenthapadi, S. Ambler,
L. Zhang, and D. Agarwal,
Bringing salary transparency to
the world: Computing robust
compensation insights via
LinkedIn Salary, CIKM 2017
(arxiv.org/abs/1703.09845)
Problem Statement
•How do we design LinkedIn Salary system taking into
account the unique privacy and security challenges,
while addressing the product requirements?
Title Region
$$
User Exp
Designer
SF Bay
Area 100K
User Exp
Designer
SF Bay
Area 115K
... ...
...
Title Region
$$
User Exp
Designer
SF Bay
Area 100K
De-identification Example
Title Region Company Industry Years of
exp
Degree FoS Skills
$$
User Exp
Designer
SF Bay
Area
Google Internet 12 BS Interactive
Media
UX,
Graphics,
...
100K
Title Region Industry
$$
User Exp
Designer
SF Bay
Area
Internet
100K
Title Region Years of
exp $$
User Exp
Designer
SF Bay
Area
10+
100K
Title Region Company Years of
exp $$
User Exp
Designer
SF Bay
Area
Google 10+
100K
#data
points >
threshold?
Yes ⇒ Copy to
Hadoop (HDFS) Note: Original submission stored as encrypted objects.
System
Architecture
Open Question
• Can we apply rigorous approaches such as differential privacy in such a
setting?
• While meeting reliability / product coverage needs
• Worst case sensitivity of quantiles to any one user’s compensation data
is large
•  Large noise may need to be added, depriving reliability/usefulness
• Need compensation insights on a continual basis
• Theoretical work on applying differential privacy under continual observations
• No practical implementations / applications
• Local differential privacy / Randomized response based approaches (Google’s RAPPOR;
Apple’s iOS differential privacy; Microsoft’s telemetry collection) don’t seem applicable
Summary
• LinkedIn Salary: a new internet application, with
unique privacy/modeling challenges
• Privacy vs. Modeling Tradeoffs
• Potential directions
• Privacy-preserving machine learning models in a practical setting
[e.g., Chaudhuri et al, JMLR 2011; Papernot et al, ICLR 2017]
• Provably private submission of compensation entries?
Thanks! Questions?
• References
• K. Kenthapadi, I. Mironov (Google), A. G. Thakurta (UC Santa Cruz; previously in
Apple privacy team), Privacy-preserving Data Mining in Industry: Practical
Challenges and Lessons Learned, ACM KDD 2018 Tutorial (Slide deck)
• K. Kenthapadi, T. T. L. Tran, PriPeARL: A Framework for Privacy-Preserving
Analytics and Reporting at LinkedIn, ACM CIKM 2018
• K. Kenthapadi, A. Chudhary, S. Ambler, LinkedIn Salary: A System for Secure
Collection and Presentation of Structured Compensation Insights to Job Seekers,
IEEE Symposium on Privacy-Aware Computing (PAC), 2017
Backup
Ad targeting:
Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM
Privacy Attacks On Ad Targeting
10 campaigns targeting 1 person (zip code, gender,
workplace, alma mater)
Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM
Facebook vs Korolova
Age
21
22
23
…
30
Ad Impressions in a week
0
0
8
…
0
10 campaigns targeting 1 person (zip code, gender,
workplace, alma mater)
Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM
Facebook vs Korolova
Interest
A
B
C
…
Z
Ad Impressions in a week
0
0
8
…
0
● Context: Microtargeted Ads
● Takeaway: Attackers can instrument ad campaigns to
identify individual users.
● Two types of attacks:
○ Inference from Impressions
○ Inference from Clicks
Facebook vs Korolova: Recap
LinkedIn Salary
Outline
• LinkedIn Salary Overview
• Challenges: Privacy, Modeling
• System Design & Architecture
• Privacy vs. Modeling Tradeoffs
LinkedIn Salary (launched in Nov, 2016)
Salary Collection Flow via Email Targeting
Current Reach (September 2018)
• A few million responses out of several millions of members targeted
• Targeted via emails since early 2016
• Countries: US, CA, UK, DE, IN, …
• Insights available for a large fraction of US monthly active users
Data Privacy Challenges
• Minimize the risk of inferring any one individual’s compensation data
• Protection against data breach
• No single point of failure
Achieved by a combination of
techniques: encryption, access control,
, aggregation,
thresholding
K. Kenthapadi, A. Chudhary, and S.
Ambler, LinkedIn Salary: A System
for Secure Collection and
Presentation of Structured
Compensation Insights to Job
Seekers, IEEE PAC 2017
(arxiv.org/abs/1705.06976)
Modeling Challenges
• Evaluation
• Modeling on de-identified data
• Robustness and stability
• Outlier detection
X. Chen, Y. Liu, L. Zhang, and K.
Kenthapadi, How LinkedIn
Economic Graph Bonds
Information and Product:
Applications in LinkedIn Salary,
KDD 2018
(arxiv.org/abs/1806.09063)
K. Kenthapadi, S. Ambler,
L. Zhang, and D. Agarwal,
Bringing salary transparency to
the world: Computing robust
compensation insights via
LinkedIn Salary, CIKM 2017
(arxiv.org/abs/1703.09845)
Problem Statement
•How do we design LinkedIn Salary system taking into
account the unique privacy and security challenges,
while addressing the product requirements?
Differential Privacy? [Dwork et al, 2006]
• Rich privacy literature (Adam-Worthmann, Samarati-Sweeney, Agrawal-Srikant, …,
Kenthapadi et al, Machanavajjhala et al, Li et al, Dwork et al)
• Limitation of anonymization techniques (as discussed in the first part)
• Worst case sensitivity of quantiles to any one user’s compensation data is
large
•  Large noise to be added, depriving reliability/usefulness
• Need compensation insights on a continual basis
• Theoretical work on applying differential privacy under continual observations
• No practical implementations / applications
• Local differential privacy / Randomized response based approaches (Google’s RAPPOR; Apple’s
iOS differential privacy; Microsoft’s telemetry collection) not applicable
Title Region
$$
User Exp
Designer
SF Bay
Area 100K
User Exp
Designer
SF Bay
Area 115K
... ...
...
Title Region
$$
User Exp
Designer
SF Bay
Area 100K
De-identification Example
Title Region Company Industry Years of
exp
Degree FoS Skills
$$
User Exp
Designer
SF Bay
Area
Google Internet 12 BS Interactive
Media
UX,
Graphics,
...
100K
Title Region Industry
$$
User Exp
Designer
SF Bay
Area
Internet
100K
Title Region Years of
exp $$
User Exp
Designer
SF Bay
Area
10+
100K
Title Region Company Years of
exp $$
User Exp
Designer
SF Bay
Area
Google 10+
100K
#data
points >
threshold?
Yes ⇒ Copy to
Hadoop (HDFS) Note: Original submission stored as encrypted objects.
System
Architecture
Collection
&
Storage
Collection & Storage
• Allow members to submit their compensation info
• Extract member attributes
• E.g., canonical job title, company, region, by invoking LinkedIn standardization services
• Securely store member attributes & compensation data
De-identification
&
Grouping
De-identification & Grouping
• Approach inspired by k-Anonymity [Samarati-Sweeney]
• “Cohort” or “Slice”
• Defined by a combination of attributes
• E.g, “User experience designers in SF Bay Area”
• Contains aggregated compensation entries from corresponding individuals
• No user name, id or any attributes other than those that define the cohort
• A cohort available for offline processing only if it has at least k entries
• Apply LinkedIn standardization software (free-form attribute  canonical version)
before grouping
• Analogous to the generalization step in k-Anonymity
De-identification & Grouping
• Slicing service
• Access member attribute info &
submission identifiers (no
compensation data)
• Generate slices & track #
submissions for each slice
• Preparation service
• Fetch compensation data (using
submission identifiers), associate
with the slice data, copy to HDFS
Insights
&
Modeling
Insights & Modeling
• Salary insight service
• Check whether the member is
eligible
• Give-to-get model
• If yes, show the insights
• Offline workflow
• Consume de-identified HDFS
dataset
• Compute robust compensation
insights
• Outlier detection
• Bayesian smoothing/inference
• Populate the insight key-value
stores
Security
Mechanisms
Security
Mechanisms
• Encryption of
member attributes
& compensation
data using different
sets of keys
• Separation of
processing
• Limiting access to
the keys
Security
Mechanisms
• Key rotation
• No single point of
failure
• Infra security
Preventing Timestamp Join based Attacks
• Inference attack by joining these on timestamp
• De-identified compensation data
• Page view logs (when a member accessed compensation collection web interface)
•  Not desirable to retain the exact timestamp
• Perturb by adding random delay (say, up to 48 hours)
• Modification based on k-Anonymity
• Generalization using a hierarchy of timestamps
• But, need to be incremental
•  Process entries within a cohort in batches of size k
• Generalize to a common timestamp
• Make additional data available only in such incremental batches
Privacy vs Modeling Tradeoffs
• LinkedIn Salary system deployed in production for ~2.5 years
• Study tradeoffs between privacy guarantees (‘k’) and data available for
computing insights
• Dataset: Compensation submission history from 1.5M LinkedIn members
• Amount of data available vs. minimum threshold, k
• Effect of processing entries in batches of size, k
Amount of
data
available vs.
threshold, k
Percent of
data available
vs. batch size,
k
Median delay
due to
batching vs.
batch size, k
Key takeaway points
• LinkedIn Salary: a new internet application, with
unique privacy/modeling challenges
• Privacy vs. Modeling Tradeoffs
• Potential directions
• Privacy-preserving machine learning models in a practical setting
[e.g., Chaudhuri et al, JMLR 2011; Papernot et al, ICLR 2017]
• Provably private submission of compensation entries?

More Related Content

What's hot

Gene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsGene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsIBM Sverige
 
SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Infrastructure Facility
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?IBM Analytics
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
Cognitive Era and Introduction to IBM Watson
Cognitive Era and Introduction to IBM WatsonCognitive Era and Introduction to IBM Watson
Cognitive Era and Introduction to IBM WatsonSubhendu Dey
 
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleO'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleVasu S
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateDATAVERSITY
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonSubhendu Dey
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Company presentation Servicenoew
Company presentation ServicenoewCompany presentation Servicenoew
Company presentation ServicenoewZeruiWei
 
AI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckAI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckThe Digital Insurer
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Keith Kraus
 
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelDecision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelLorien Pratt
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
SMART Seminar Series: Learning Journeys – Making learning visible in developi...
SMART Seminar Series: Learning Journeys – Making learning visible in developi...SMART Seminar Series: Learning Journeys – Making learning visible in developi...
SMART Seminar Series: Learning Journeys – Making learning visible in developi...SMART Infrastructure Facility
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
AI and IBM Watson in legal
AI and IBM Watson in legalAI and IBM Watson in legal
AI and IBM Watson in legalAnders Quitzau
 

What's hot (20)

Gene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsGene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analytics
 
SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Cognitive Era and Introduction to IBM Watson
Cognitive Era and Introduction to IBM WatsonCognitive Era and Introduction to IBM Watson
Cognitive Era and Introduction to IBM Watson
 
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleO'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning Update
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM Watson
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Company presentation Servicenoew
Company presentation ServicenoewCompany presentation Servicenoew
Company presentation Servicenoew
 
AI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckAI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar Deck
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
 
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelDecision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
SMART Seminar Series: Learning Journeys – Making learning visible in developi...
SMART Seminar Series: Learning Journeys – Making learning visible in developi...SMART Seminar Series: Learning Journeys – Making learning visible in developi...
SMART Seminar Series: Learning Journeys – Making learning visible in developi...
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
AI and IBM Watson in legal
AI and IBM Watson in legalAI and IBM Watson in legal
AI and IBM Watson in legal
 

Similar to Privacy-Preserving Analytics at LinkedIn

Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsApplause
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorRudradeb Mitra
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryMark Constable
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PMProduct School
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIsBala Iyer
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]aj_cache
 
Build Big Data Products at LinkedIn
Build Big Data Products at LinkedInBuild Big Data Products at LinkedIn
Build Big Data Products at LinkedInLili Wu
 

Similar to Privacy-Preserving Analytics at LinkedIn (20)

Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedIn
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI Applications
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PM
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]
 
Model Factory at ING Bank
Model Factory at ING BankModel Factory at ING Bank
Model Factory at ING Bank
 
Week2 chapters1 3
Week2 chapters1 3Week2 chapters1 3
Week2 chapters1 3
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Build Big Data Products at LinkedIn
Build Big Data Products at LinkedInBuild Big Data Products at LinkedIn
Build Big Data Products at LinkedIn
 

More from Krishnaram Kenthapadi

Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Krishnaram Kenthapadi
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Krishnaram Kenthapadi
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Krishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Krishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 

More from Krishnaram Kenthapadi (18)

Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
Amazon SageMaker Clarify
Amazon SageMaker ClarifyAmazon SageMaker Clarify
Amazon SageMaker Clarify
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 

Recently uploaded

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 

Recently uploaded (20)

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 

Privacy-Preserving Analytics at LinkedIn

  • 1. Privacy-preserving Analytics and Data Mining at LinkedIn Practical Challenges & Lessons Learned Krishnaram Kenthapadi AI @ LinkedIn Differential Privacy Deployed Workshop, September 2018
  • 2. Key Takeaways • Framework to compute robust, privacy-preserving analytics • Addressing challenges such as preserving member privacy, product coverage, utility, and data consistency • Scalable system/tools for application across analytics applications • Design decisions / trade-offs based on inputs from product / legal teams • Privacy challenges/design for a large crowdsourced system (LinkedIn Salary)
  • 3. Analytics & Reporting Products at LinkedIn Profile View Analytics 3 Content Analytics Ad Campaign Analytics All showing demographics of members engaging with the product
  • 4. • Admit only a small # of predetermined query types • Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns Analytics & Reporting Products at LinkedIn
  • 5. • Admit only a small # of predetermined query types • Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns E.g., Clicks on a given adE.g., Title = “Senior Director” Analytics & Reporting Products at LinkedIn
  • 6. Privacy Requirements • Attacker cannot infer whether a member performed an action • E.g., click on an article or an ad • Attacker may use auxiliary knowledge • E.g., knowledge of attributes associated with the target member (e.g., obtained from this member’s LinkedIn profile) • E.g., knowledge of all other members that performed similar action (e.g., by creating fake accounts) • Examples of potential attacks discussed in ACM CIKM’18 paper
  • 7. Possible Privacy Attacks 7 Targeting: Senior directors in US, who studied at Cornell Matches ~16k LinkedIn members → over minimum targeting threshold Demographic breakdown: Company = X May match exactly one person → can determine whether the person clicks on the ad or not Require minimum reporting threshold Attacker could create fake profiles! E.g. if threshold is 10, create 9 fake profiles that all click. Rounding mechanism E.g., report incremental of 10 Still amenable to attacks E.g. using incremental counts over time to infer individuals’ actions Need rigorous techniques to preserve member privacy, not revealing exact aggregate counts
  • 8. Key Product Desiderata • Coverage & Utility • Data Consistency • for repeated queries • over time • between total and breakdowns • across entity hierarchy • across action hierarchy • for top k queries
  • 9. Problem Statement Compute robust, reliable analytics in a privacy- preserving manner, while addressing the product desiderata such as coverage, utility, and consistency.
  • 10. Our Approach • Privacy model: Event-level differential privacy • Pseudo-random noise generation • Post-processing to achieve consistency • Incremental: over time, the released counts are non-decreasing • Consistency at the account/campaign group/campaign level • Consistency for top k queries K. Kenthapadi, T. T. L. Tran, PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018.
  • 11. PriPeARL: A Framework for Privacy-Preserving Analytics 11 Pseudo-random noise generation, inspired by differential privacy ● Entity id (e.g., ad creative/campaign/account) ● Demographic dimension ● Stat type (impressions, clicks) ● Time range ● Fixed secret seed Uniformly Random Fraction ● Cryptographic hash ● Normalize to (0,1) Random Noise Laplace Noise ● Fixed ε True Count Noisy Count To satisfy consistency requirements ● Pseudo-random noise → same query has same result over time, avoid averaging attack. ● For non-canonical queries (e.g., time ranges, aggregate multiple entities) ○ Use the hierarchy and partition into canonical queries ○ Compute noise for each canonical queries and sum up the noisy counts
  • 13. Performance Evaluation: Setup 13 ● Experiments using LinkedIn ad analytics data ○ Consider distribution of impression and click queries across (account, ad campaign) and demographic breakdowns. ● Examine ○ Tradeoff between privacy and utility ○ Effect of varying minimum threshold (non- negative) ○ Top-n queries
  • 14. Performance Evaluation: Results 14 Privacy and Utility Tradeoff For ε = 1, average absolute and signed errors are small for both queries (impression and click) Variance is also small, ~95% of queries have error of at most 2. Top-N Queries Common use case in LinkedIn application. Jaccard distance as a function of ε and n. This shows the worst case since queries with return sets ≤ n and error of 0 were omitted.
  • 15. Lessons Learned from Deployment (> 1 year) • Consistency vs. utility trade-off • Semantic consistency vs. unbiased, unrounded noise • Suppression of small counts • Online computation and performance requirements • Scaling across analytics applications • Tools for ease of adoption (code/API library, hands-on how-to tutorial) help!
  • 16. Summary • Framework to compute robust, privacy-preserving analytics • Addressing challenges such as preserving member privacy, product coverage, utility, and data consistency • Future • Utility maximization problem given constraints on the ‘privacy loss budget’ per user • E.g., noise with larger variance to impressions but less noise to clicks (or conversions) • E.g., more noise to broader time range sub-queries and less noise to granular time range sub-queries • Reference: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018.
  • 18. LinkedIn Salary (launched in Nov, 2016)
  • 19. Salary Collection Flow via Email Targeting
  • 20. Current Reach (September 2018) • A few million responses out of several millions of members targeted • Targeted via emails since early 2016 • Countries: US, CA, UK, DE, IN, … • Insights available for a large fraction of US monthly active users
  • 21. Data Privacy Challenges • Minimize the risk of inferring any one individual’s compensation data • Protection against data breach • No single point of failure Achieved by a combination of techniques: encryption, access control, , aggregation, thresholding K. Kenthapadi, A. Chudhary, and S. Ambler, LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers, IEEE PAC 2017 (arxiv.org/abs/1705.06976)
  • 22. Modeling Challenges • Evaluation • Modeling on de-identified data • Robustness and stability • Outlier detection X. Chen, Y. Liu, L. Zhang, and K. Kenthapadi, How LinkedIn Economic Graph Bonds Information and Product: Applications in LinkedIn Salary, KDD 2018 (arxiv.org/abs/1806.09063) K. Kenthapadi, S. Ambler, L. Zhang, and D. Agarwal, Bringing salary transparency to the world: Computing robust compensation insights via LinkedIn Salary, CIKM 2017 (arxiv.org/abs/1703.09845)
  • 23. Problem Statement •How do we design LinkedIn Salary system taking into account the unique privacy and security challenges, while addressing the product requirements?
  • 24. Title Region $$ User Exp Designer SF Bay Area 100K User Exp Designer SF Bay Area 115K ... ... ... Title Region $$ User Exp Designer SF Bay Area 100K De-identification Example Title Region Company Industry Years of exp Degree FoS Skills $$ User Exp Designer SF Bay Area Google Internet 12 BS Interactive Media UX, Graphics, ... 100K Title Region Industry $$ User Exp Designer SF Bay Area Internet 100K Title Region Years of exp $$ User Exp Designer SF Bay Area 10+ 100K Title Region Company Years of exp $$ User Exp Designer SF Bay Area Google 10+ 100K #data points > threshold? Yes ⇒ Copy to Hadoop (HDFS) Note: Original submission stored as encrypted objects.
  • 26. Open Question • Can we apply rigorous approaches such as differential privacy in such a setting? • While meeting reliability / product coverage needs • Worst case sensitivity of quantiles to any one user’s compensation data is large •  Large noise may need to be added, depriving reliability/usefulness • Need compensation insights on a continual basis • Theoretical work on applying differential privacy under continual observations • No practical implementations / applications • Local differential privacy / Randomized response based approaches (Google’s RAPPOR; Apple’s iOS differential privacy; Microsoft’s telemetry collection) don’t seem applicable
  • 27. Summary • LinkedIn Salary: a new internet application, with unique privacy/modeling challenges • Privacy vs. Modeling Tradeoffs • Potential directions • Privacy-preserving machine learning models in a practical setting [e.g., Chaudhuri et al, JMLR 2011; Papernot et al, ICLR 2017] • Provably private submission of compensation entries?
  • 28. Thanks! Questions? • References • K. Kenthapadi, I. Mironov (Google), A. G. Thakurta (UC Santa Cruz; previously in Apple privacy team), Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons Learned, ACM KDD 2018 Tutorial (Slide deck) • K. Kenthapadi, T. T. L. Tran, PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018 • K. Kenthapadi, A. Chudhary, S. Ambler, LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers, IEEE Symposium on Privacy-Aware Computing (PAC), 2017
  • 30. Ad targeting: Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM Privacy Attacks On Ad Targeting
  • 31. 10 campaigns targeting 1 person (zip code, gender, workplace, alma mater) Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM Facebook vs Korolova Age 21 22 23 … 30 Ad Impressions in a week 0 0 8 … 0
  • 32. 10 campaigns targeting 1 person (zip code, gender, workplace, alma mater) Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM Facebook vs Korolova Interest A B C … Z Ad Impressions in a week 0 0 8 … 0
  • 33. ● Context: Microtargeted Ads ● Takeaway: Attackers can instrument ad campaigns to identify individual users. ● Two types of attacks: ○ Inference from Impressions ○ Inference from Clicks Facebook vs Korolova: Recap
  • 35. Outline • LinkedIn Salary Overview • Challenges: Privacy, Modeling • System Design & Architecture • Privacy vs. Modeling Tradeoffs
  • 36. LinkedIn Salary (launched in Nov, 2016)
  • 37. Salary Collection Flow via Email Targeting
  • 38. Current Reach (September 2018) • A few million responses out of several millions of members targeted • Targeted via emails since early 2016 • Countries: US, CA, UK, DE, IN, … • Insights available for a large fraction of US monthly active users
  • 39. Data Privacy Challenges • Minimize the risk of inferring any one individual’s compensation data • Protection against data breach • No single point of failure Achieved by a combination of techniques: encryption, access control, , aggregation, thresholding K. Kenthapadi, A. Chudhary, and S. Ambler, LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers, IEEE PAC 2017 (arxiv.org/abs/1705.06976)
  • 40. Modeling Challenges • Evaluation • Modeling on de-identified data • Robustness and stability • Outlier detection X. Chen, Y. Liu, L. Zhang, and K. Kenthapadi, How LinkedIn Economic Graph Bonds Information and Product: Applications in LinkedIn Salary, KDD 2018 (arxiv.org/abs/1806.09063) K. Kenthapadi, S. Ambler, L. Zhang, and D. Agarwal, Bringing salary transparency to the world: Computing robust compensation insights via LinkedIn Salary, CIKM 2017 (arxiv.org/abs/1703.09845)
  • 41. Problem Statement •How do we design LinkedIn Salary system taking into account the unique privacy and security challenges, while addressing the product requirements?
  • 42. Differential Privacy? [Dwork et al, 2006] • Rich privacy literature (Adam-Worthmann, Samarati-Sweeney, Agrawal-Srikant, …, Kenthapadi et al, Machanavajjhala et al, Li et al, Dwork et al) • Limitation of anonymization techniques (as discussed in the first part) • Worst case sensitivity of quantiles to any one user’s compensation data is large •  Large noise to be added, depriving reliability/usefulness • Need compensation insights on a continual basis • Theoretical work on applying differential privacy under continual observations • No practical implementations / applications • Local differential privacy / Randomized response based approaches (Google’s RAPPOR; Apple’s iOS differential privacy; Microsoft’s telemetry collection) not applicable
  • 43. Title Region $$ User Exp Designer SF Bay Area 100K User Exp Designer SF Bay Area 115K ... ... ... Title Region $$ User Exp Designer SF Bay Area 100K De-identification Example Title Region Company Industry Years of exp Degree FoS Skills $$ User Exp Designer SF Bay Area Google Internet 12 BS Interactive Media UX, Graphics, ... 100K Title Region Industry $$ User Exp Designer SF Bay Area Internet 100K Title Region Years of exp $$ User Exp Designer SF Bay Area 10+ 100K Title Region Company Years of exp $$ User Exp Designer SF Bay Area Google 10+ 100K #data points > threshold? Yes ⇒ Copy to Hadoop (HDFS) Note: Original submission stored as encrypted objects.
  • 46. Collection & Storage • Allow members to submit their compensation info • Extract member attributes • E.g., canonical job title, company, region, by invoking LinkedIn standardization services • Securely store member attributes & compensation data
  • 48. De-identification & Grouping • Approach inspired by k-Anonymity [Samarati-Sweeney] • “Cohort” or “Slice” • Defined by a combination of attributes • E.g, “User experience designers in SF Bay Area” • Contains aggregated compensation entries from corresponding individuals • No user name, id or any attributes other than those that define the cohort • A cohort available for offline processing only if it has at least k entries • Apply LinkedIn standardization software (free-form attribute  canonical version) before grouping • Analogous to the generalization step in k-Anonymity
  • 49. De-identification & Grouping • Slicing service • Access member attribute info & submission identifiers (no compensation data) • Generate slices & track # submissions for each slice • Preparation service • Fetch compensation data (using submission identifiers), associate with the slice data, copy to HDFS
  • 51. Insights & Modeling • Salary insight service • Check whether the member is eligible • Give-to-get model • If yes, show the insights • Offline workflow • Consume de-identified HDFS dataset • Compute robust compensation insights • Outlier detection • Bayesian smoothing/inference • Populate the insight key-value stores
  • 53. Security Mechanisms • Encryption of member attributes & compensation data using different sets of keys • Separation of processing • Limiting access to the keys
  • 54. Security Mechanisms • Key rotation • No single point of failure • Infra security
  • 55. Preventing Timestamp Join based Attacks • Inference attack by joining these on timestamp • De-identified compensation data • Page view logs (when a member accessed compensation collection web interface) •  Not desirable to retain the exact timestamp • Perturb by adding random delay (say, up to 48 hours) • Modification based on k-Anonymity • Generalization using a hierarchy of timestamps • But, need to be incremental •  Process entries within a cohort in batches of size k • Generalize to a common timestamp • Make additional data available only in such incremental batches
  • 56. Privacy vs Modeling Tradeoffs • LinkedIn Salary system deployed in production for ~2.5 years • Study tradeoffs between privacy guarantees (‘k’) and data available for computing insights • Dataset: Compensation submission history from 1.5M LinkedIn members • Amount of data available vs. minimum threshold, k • Effect of processing entries in batches of size, k
  • 59. Median delay due to batching vs. batch size, k
  • 60. Key takeaway points • LinkedIn Salary: a new internet application, with unique privacy/modeling challenges • Privacy vs. Modeling Tradeoffs • Potential directions • Privacy-preserving machine learning models in a practical setting [e.g., Chaudhuri et al, JMLR 2011; Papernot et al, ICLR 2017] • Provably private submission of compensation entries?