SlideShare a Scribd company logo
1 of 48
Distributed & Private
Computation
Praneeth Vepakomma
MIT
On Device Anonymize Obfuscate Smash/ Encrypt
Local calculations
Only download large
datasets
Random Identifiers Add Noise
Quantize
Differential Privacy
Smash: Convert to lower
dimensions or model
Encrypt: Use ‘shares’ and
multiple servers
ZKP: Zero knowledge proof
Imaging devices, Distributed ML and Privacy
Guiding Question
How to create portable skin imaging device(s) (like a wrist watch) that
collaborate and train to do better diagnostics without sharing sensitive data?
How is this incentivized?
2009: Crowdsensing
Patient
Scale

Population
Scale
Gupta, McDuff, Raskar 2016
Assuming penalty of
just $6 per person =
$480M
Reality: Per-person
costs are way
higher in hundreds
or thousands of
dollars.
Ethical, moral, legal,
trust, economic,
news and PR
repercussions.
Follow-up effects
Semantic Privacy:
Architectures to prevent
empirical reconstruction
attacks
Formal Privacy + Images
Differentially Private Image Retrieval
Iterative feedback/private data
structures: Improving DP Histograms
& set intersection verification
Parallel Combinatorial
Optimization without
submodularity
What’s new in DPC @ MIT: Distributed & Private Computation Projects
Distributed ML: Split Learning AirMixML PoC: Wireless + Formal Renyi
DP + ML
Splintering: A foundation for
distributed scientific
computation
Part 1: 'Distributed' ML (DML)
Part 2: 'Private' DML & Computation (DPC)
Today’s 'Distributed' ML
Server
Client1 Client2 Client3 ..
Master Algo for
Diagnostic, Treatments
Device 1 2 3 ..
Device
Data
Train ML
100
b1. Federated Learning
Server
Client1 Client2 Client3 ..
[McMahan17]
b2. Split Learning
Server
Client1 Client2 Client3 ..
Smasher
Smashed Data
Back
Prop
[Gupta17, Vepakomma, Singh, Raskar]
12
VGG over CIFAR
10
ResNet over
CIFAR 100
Federated
Split
Compute Bandwidth
Vepakomma, Swedish, Gupta, Dubey, Raskar 2018
Raskar, McMahan et al CVPR 2019
Communication ratio: Rectangular Hyperbola
Local Synchronization/Asynchronous SL
Split
DenseNet for
Classification
Split
U-Net for
Segmentation
Versatile modes of
operation (MIT/MGH)
Input Data
Labels
Input Data
Labels
Input
Data
Input
Data
Client 2
Server
Client 1
Smasher
Server
Classic Boomerang without
sharing labels Multi-Client/Vertical Partitioning
IoT
Low Compute/Comms
(Cannot train models)
HealthData
Few Clients
(non-homogeneous data)
Complex Models
(large unoptimized models)
Too Little Data Per Client
(Unviable to train or send large models)
Many Untrusted Parties
(How to encourage 3rd party developers)
Challenges in Federated Learning
Part 2:
‘Private’ DML & Computation (DPC)
Client
Server
Smasher
Input Data
Labels
Attacker
Privacy and Attacks: Preventing reconstruction
Inverse
Model
Invert Data Invert Model
Private Collaborative Inference
Set-up
Attacks
Client FL
Server
Smasher
Smashed Data
𝐳
NoPeek-Infer via Decorrelation
Input Data
Labels
Input Data 𝐱
NoPeek-Infer: Preventing reconstruction attacks in distributed predictive inference after on-premise training
Vepakomma, Singh, Zhang, Raskar 2021
Decorrelation
Autoencoder
with NoPeek
VGG whitebox
with NoPeek
Reconstruction testing: NoPeek
Colorectal histology image public dataset
Decreasing
leakage
Original
Activation
Original
Activation
Original
Original
Activation
Activation
Traditional
NoPeek
DISCO: Pruning to prevent reconstruction
DISCO: Pruning to prevent reconstruction
Differential Privacy (Zoomed-out view)
Learning nothing about an individual
while learning useful statistical info about a population
• Is provided (noisy) data enough for task accuracy? (Util-priv tradeoff)
(ε, δ)-Differential Privacy:
• The distribution of the output M(D) (a query) on database D is nearly the
same as M(D′) for all adjacent databases D and D′:
∀S: Pr[M(D) ∈ S] ≤ exp(ε) ∙ Pr[M(D′) ∈ S]+δ.
Example mechanisms: Laplace Mechanism for ε-DP , Gaussian Mechanism for
(ε, δ)-DP, Contractive Noisy Iterations and the list goes on….
2 Key properties: Post-processing (once DP, DP forever) & composition (loss of
privacy via multiple queries)
How much noise to add? Depends on Global Sensitivity of query (not data).
Zoomed-in view of DP
Retrieve Nearest
Matches to
Privatized Image
Query
Client
Private Image Retrieval with Differential Privacy
Differentially Private Supervised Manifold Learning with Applications like Private Image Retrieval, Vepakomma, Balla, Raskar, 2021
Lifecycle of PrivateMail
Key: Perform retrieval after ‘differentially private’ manifold learning on DL features for IR
Big Question: How to Perform ‘differentially private’ manifold learning?
Non-Private Supervised Manifold Embeddings
PrivateMail Protocol
Differentially Private Supervised Manifold Embeddings
Local sensitivity
Global sensitivity
Can you spot the
difference?
Private Image Retrieval
PrivateMail Vs DP d-SNE vs non pvt t-SNE
Compute
Compute or memory on clients (e.g. IoT)
Unknown / large # of Clients
Unreliable nodes (slow downs, faults)
Adversarial clients
Trusting services, unregulated/unethical use
Ownership+control of ecosystem
Compute Local vs Remote tradeoff
Communication Bandwidth
Limited communication (few rounds, unreliable
comms)
Latency of training (queuing delays)
Dynamic availability (streaming, time zone)
Data
Too little or too much data per client
Unbalanced (size, distribution, adversarial )
Highly non-IID (heterogeneous data)
Personalization
Vertical partitions (instead of horizontal)
Exposing labels
ML
Validation on hidden data
Parameter tuning
Convergence guarantee
Incremental updates (new features)
Data leakage/ Invertibility
Challenges/Choices in Decentralized & Pvt ML: Compute, Privacy, BW,
Mem, Accuracy/Uility
THANK YOU/ Q&A
Split learning ported into PySyft & FedML.ai
Project Website: splitlearning.mit.edu
Annual Research Workshop: SLDML
Tutorials/Talk Videos
vepakom@mit.edu
AirMixML
• Over-the-Air Data Mixup for Inherently
Privacy-Preserving Edge Machine Learning
• Wireless communications + PPML
Client creates secure splinters
from raw data
Server computes
on splinters
Send Secure Splinters
Receive. intermediate
results
Client Unsplinters for
final result
Splintering, Vepakomma, Raskar et.al 2020
Stochastic Splintering
Example 1: splintering for sigmoid
Example 2: splintering for softmax
Example 3:
splintering
for matrix
inverse
DAMS: Proposed private sketching data structure for set
intersection verification queries
Client
Device
Set Intersection
Verification
Result
• Key Idea: The algorithm is run q times using a different dictionary of hash functions in each of the run of the private sketching algorithm
• The final result is obtained as the average of the estimated counts. We refer to this option as our proposed private DAMS estimator.
Theoretical guarantees on utility
• Baseline 1: The scenario of using eps = p.eps’ with the algorithm being run once with one set of hash functions.
– This is equivalent to the privacy level obtained when the same set of hash functions are used across p runs of the algorithm on the
same dataset due to the sequential composition property
• Baseline 2: The scenario of using eps = eps’ , while the algorithm is run q times using a same dictionary of hash
functions in each of the run as part of the private count-mean-sketch algorithm.
– This is an important baseline to compare against in order to confirm that changing the hash function dictionary across each of the p
runs, is a better option than keeping them same across the p runs
• DAMS- Diversifed Averaging for Meta estimation of Sketches:
– The algorithm is run q times using a different dictionary of hash functions in each of the run of the private count-mean-sketch algorithm
– The final result is obtained as the average of the estimated counts.
– Proved Guarantees:
• Private DAMS estimator is unbiased
• Var(DAMS) < Var(Baseline 1) ; when eps > 2
• Var(DAMS) < Var(Baseline 2) ; always
Sum of covariances of order
DAMS: Meta-estimation of private sketch data structures for differentially private COVID-19 contact tracing,
P.Vepakomma, S.N.Pushpita, R.Raskar, (PRIML AND PPML JOINT EDITION, NeurIPS-2020)
DAMS: Improved
TPR/FPR’s & lower
variance than
traditional private
data structures like
Count-Mean-Sketch
Iterative Feedback to Improve Utility-Privacy
Tradeoff for DP Histograms
Key Idea: Use subsample to privately estimate heavy-hitters and use that as feedback
to distribute epsilon across partitions of HH and !HH
Quasi-Concave
Set Functions
Induced Quasi-
Concave Set Functions
Duality
Exponential
time
Polynomial
time
Monotone Linkage
Functions
Data
Data Data Data
Broadcast
data
maximal minimal
pi-cluster
pi-series pi-series pi-series
O(K2
N log N) + O(log K)
O(log K)
Objective
Find
Where
Parallel Quasi-concave set optimization:
A new frontier that scales without needing submodularity
Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity, Vepakomma, Kempner, Raskar, 2021
Compute, Bandwidth, Accuracy
Heterogeneous Data, Few hospitals
Multiple medical
imaging benchmarks
Split
Federated

More Related Content

What's hot

A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...Databricks
 
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...IJNSA Journal
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Vijay Srinivas Agneeswaran, Ph.D
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Vijay Srinivas Agneeswaran, Ph.D
 
Building Azure Machine Learning Models
Building Azure Machine Learning ModelsBuilding Azure Machine Learning Models
Building Azure Machine Learning ModelsEng Teong Cheah
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...IDES Editor
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones Ido Shilon
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine LearningAmazon Web Services
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Spark MLlib and Viral Tweets
Spark MLlib and Viral TweetsSpark MLlib and Viral Tweets
Spark MLlib and Viral TweetsAsim Jalis
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance VideosDatabricks
 
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Angelo Corsaro
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Nexgen Technology
 

What's hot (20)

A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
 
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
REDUCING THE COGNITIVE LOAD ON ANALYSTS THROUGH HAMMING DISTANCE BASED ALERT ...
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 
Hm2413291336
Hm2413291336Hm2413291336
Hm2413291336
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
 
Building Azure Machine Learning Models
Building Azure Machine Learning ModelsBuilding Azure Machine Learning Models
Building Azure Machine Learning Models
 
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Spark MLlib and Viral Tweets
Spark MLlib and Viral TweetsSpark MLlib and Viral Tweets
Spark MLlib and Viral Tweets
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance Videos
 
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...
 

Similar to expeditions praneeth_june-2021

Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseAMAR NATH
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer securityKishor Datta Gupta
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# ProjectsVijay Karan
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# ProjectsVijay Karan
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...Scott Farley
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSravan Puttagunta
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfCarlos Paredes
 
A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...Wenjing Chu
 
Aplications for machine learning in IoT
Aplications for machine learning in IoTAplications for machine learning in IoT
Aplications for machine learning in IoTYashesh Shroff
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxSaravanaD2
 

Similar to expeditions praneeth_june-2021 (20)

Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbase
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# Projects
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# Projects
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
 
A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...A sentient network - How High-velocity Data and Machine Learning will Shape t...
A sentient network - How High-velocity Data and Machine Learning will Shape t...
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
 
Aplications for machine learning in IoT
Aplications for machine learning in IoTAplications for machine learning in IoT
Aplications for machine learning in IoT
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptx
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

expeditions praneeth_june-2021

  • 2. On Device Anonymize Obfuscate Smash/ Encrypt Local calculations Only download large datasets Random Identifiers Add Noise Quantize Differential Privacy Smash: Convert to lower dimensions or model Encrypt: Use ‘shares’ and multiple servers ZKP: Zero knowledge proof Imaging devices, Distributed ML and Privacy Guiding Question How to create portable skin imaging device(s) (like a wrist watch) that collaborate and train to do better diagnostics without sharing sensitive data?
  • 3. How is this incentivized?
  • 5. Assuming penalty of just $6 per person = $480M Reality: Per-person costs are way higher in hundreds or thousands of dollars. Ethical, moral, legal, trust, economic, news and PR repercussions. Follow-up effects
  • 6. Semantic Privacy: Architectures to prevent empirical reconstruction attacks Formal Privacy + Images Differentially Private Image Retrieval Iterative feedback/private data structures: Improving DP Histograms & set intersection verification Parallel Combinatorial Optimization without submodularity What’s new in DPC @ MIT: Distributed & Private Computation Projects Distributed ML: Split Learning AirMixML PoC: Wireless + Formal Renyi DP + ML Splintering: A foundation for distributed scientific computation
  • 7. Part 1: 'Distributed' ML (DML) Part 2: 'Private' DML & Computation (DPC)
  • 9. Master Algo for Diagnostic, Treatments Device 1 2 3 .. Device Data Train ML 100
  • 10. b1. Federated Learning Server Client1 Client2 Client3 .. [McMahan17]
  • 11. b2. Split Learning Server Client1 Client2 Client3 .. Smasher Smashed Data Back Prop [Gupta17, Vepakomma, Singh, Raskar]
  • 12. 12 VGG over CIFAR 10 ResNet over CIFAR 100 Federated Split Compute Bandwidth Vepakomma, Swedish, Gupta, Dubey, Raskar 2018 Raskar, McMahan et al CVPR 2019
  • 16. Input Data Labels Input Data Labels Input Data Input Data Client 2 Server Client 1 Smasher Server Classic Boomerang without sharing labels Multi-Client/Vertical Partitioning
  • 17. IoT Low Compute/Comms (Cannot train models) HealthData Few Clients (non-homogeneous data) Complex Models (large unoptimized models) Too Little Data Per Client (Unviable to train or send large models) Many Untrusted Parties (How to encourage 3rd party developers) Challenges in Federated Learning
  • 18. Part 2: ‘Private’ DML & Computation (DPC)
  • 19. Client Server Smasher Input Data Labels Attacker Privacy and Attacks: Preventing reconstruction Inverse Model Invert Data Invert Model
  • 21. Client FL Server Smasher Smashed Data 𝐳 NoPeek-Infer via Decorrelation Input Data Labels Input Data 𝐱 NoPeek-Infer: Preventing reconstruction attacks in distributed predictive inference after on-premise training Vepakomma, Singh, Zhang, Raskar 2021
  • 23. Autoencoder with NoPeek VGG whitebox with NoPeek Reconstruction testing: NoPeek
  • 24. Colorectal histology image public dataset Decreasing leakage Original Activation Original Activation Original Original Activation Activation Traditional NoPeek
  • 25. DISCO: Pruning to prevent reconstruction
  • 26. DISCO: Pruning to prevent reconstruction
  • 27. Differential Privacy (Zoomed-out view) Learning nothing about an individual while learning useful statistical info about a population • Is provided (noisy) data enough for task accuracy? (Util-priv tradeoff) (ε, δ)-Differential Privacy: • The distribution of the output M(D) (a query) on database D is nearly the same as M(D′) for all adjacent databases D and D′: ∀S: Pr[M(D) ∈ S] ≤ exp(ε) ∙ Pr[M(D′) ∈ S]+δ. Example mechanisms: Laplace Mechanism for ε-DP , Gaussian Mechanism for (ε, δ)-DP, Contractive Noisy Iterations and the list goes on…. 2 Key properties: Post-processing (once DP, DP forever) & composition (loss of privacy via multiple queries) How much noise to add? Depends on Global Sensitivity of query (not data).
  • 29. Retrieve Nearest Matches to Privatized Image Query Client Private Image Retrieval with Differential Privacy Differentially Private Supervised Manifold Learning with Applications like Private Image Retrieval, Vepakomma, Balla, Raskar, 2021
  • 30. Lifecycle of PrivateMail Key: Perform retrieval after ‘differentially private’ manifold learning on DL features for IR Big Question: How to Perform ‘differentially private’ manifold learning?
  • 32. PrivateMail Protocol Differentially Private Supervised Manifold Embeddings Local sensitivity Global sensitivity Can you spot the difference?
  • 33.
  • 34. Private Image Retrieval PrivateMail Vs DP d-SNE vs non pvt t-SNE
  • 35. Compute Compute or memory on clients (e.g. IoT) Unknown / large # of Clients Unreliable nodes (slow downs, faults) Adversarial clients Trusting services, unregulated/unethical use Ownership+control of ecosystem Compute Local vs Remote tradeoff Communication Bandwidth Limited communication (few rounds, unreliable comms) Latency of training (queuing delays) Dynamic availability (streaming, time zone) Data Too little or too much data per client Unbalanced (size, distribution, adversarial ) Highly non-IID (heterogeneous data) Personalization Vertical partitions (instead of horizontal) Exposing labels ML Validation on hidden data Parameter tuning Convergence guarantee Incremental updates (new features) Data leakage/ Invertibility Challenges/Choices in Decentralized & Pvt ML: Compute, Privacy, BW, Mem, Accuracy/Uility
  • 37. Split learning ported into PySyft & FedML.ai Project Website: splitlearning.mit.edu Annual Research Workshop: SLDML Tutorials/Talk Videos vepakom@mit.edu
  • 38. AirMixML • Over-the-Air Data Mixup for Inherently Privacy-Preserving Edge Machine Learning • Wireless communications + PPML
  • 39. Client creates secure splinters from raw data Server computes on splinters Send Secure Splinters Receive. intermediate results Client Unsplinters for final result Splintering, Vepakomma, Raskar et.al 2020 Stochastic Splintering
  • 40. Example 1: splintering for sigmoid
  • 41. Example 2: splintering for softmax
  • 43. DAMS: Proposed private sketching data structure for set intersection verification queries Client Device Set Intersection Verification Result • Key Idea: The algorithm is run q times using a different dictionary of hash functions in each of the run of the private sketching algorithm • The final result is obtained as the average of the estimated counts. We refer to this option as our proposed private DAMS estimator.
  • 44. Theoretical guarantees on utility • Baseline 1: The scenario of using eps = p.eps’ with the algorithm being run once with one set of hash functions. – This is equivalent to the privacy level obtained when the same set of hash functions are used across p runs of the algorithm on the same dataset due to the sequential composition property • Baseline 2: The scenario of using eps = eps’ , while the algorithm is run q times using a same dictionary of hash functions in each of the run as part of the private count-mean-sketch algorithm. – This is an important baseline to compare against in order to confirm that changing the hash function dictionary across each of the p runs, is a better option than keeping them same across the p runs • DAMS- Diversifed Averaging for Meta estimation of Sketches: – The algorithm is run q times using a different dictionary of hash functions in each of the run of the private count-mean-sketch algorithm – The final result is obtained as the average of the estimated counts. – Proved Guarantees: • Private DAMS estimator is unbiased • Var(DAMS) < Var(Baseline 1) ; when eps > 2 • Var(DAMS) < Var(Baseline 2) ; always Sum of covariances of order DAMS: Meta-estimation of private sketch data structures for differentially private COVID-19 contact tracing, P.Vepakomma, S.N.Pushpita, R.Raskar, (PRIML AND PPML JOINT EDITION, NeurIPS-2020)
  • 45. DAMS: Improved TPR/FPR’s & lower variance than traditional private data structures like Count-Mean-Sketch
  • 46. Iterative Feedback to Improve Utility-Privacy Tradeoff for DP Histograms Key Idea: Use subsample to privately estimate heavy-hitters and use that as feedback to distribute epsilon across partitions of HH and !HH
  • 47. Quasi-Concave Set Functions Induced Quasi- Concave Set Functions Duality Exponential time Polynomial time Monotone Linkage Functions Data Data Data Data Broadcast data maximal minimal pi-cluster pi-series pi-series pi-series O(K2 N log N) + O(log K) O(log K) Objective Find Where Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity, Vepakomma, Kempner, Raskar, 2021
  • 48. Compute, Bandwidth, Accuracy Heterogeneous Data, Few hospitals Multiple medical imaging benchmarks Split Federated