SlideShare a Scribd company logo
1 of 83
Download to read offline
Federated Learning,
from Research to Practice
Brendan McMahan
Presenting the work of many
CMU 2019.09.05 g.co/federated
Federated learning
Enable machine learning engineers and data scientists
to work productively with decentralized data
with privacy by default
Outline
1. Why do we need federated learning?
2. What is federated learning?
3. Federated learning at Google
4. Federated learning beyond Google
5. Federated learning and privacy
6. Open problems
Why federated learning?
Data is born at the edge
Billions of phones & IoT devices constantly generate data
Data enables better products and smarter models
Data processing is moving on device:
● Improved latency
● Works offline
● Better battery life
● Privacy advantages
E.g., on-device inference for mobile
keyboards and cameras.
Can data live at the edge?
Data processing is moving on device:
● Improved latency
● Works offline
● Better battery life
● Privacy advantages
E.g., on-device inference for mobile
keyboards and cameras.
Can data live at the edge?
What about analytics?
What about learning?
2014: Three choices
Don’t use data to
improve products
and services
Log the data
centrally anyway
Invent a
new solution
Choose your poison 🕱
2014: Three choices
Don’t use data to
improve products
and services
Log the data
centrally anyway
Invent a
new solution
Choose your poison 🕱
2019: Good reason to hope
Don’t use data to
improve products
and services
Log the data
centrally anyway
Federated learning
and analytics
What is federated learning?
cloud
data
Train & evaluate
on cloud data
Model engineer
workflow
Final model
validation steps
Model
deployment
workflow
Model
deployment
workflow
Deploy model
to devices
for on-device
inference
Train & evaluate
on decentralized
data
Model engineer
workflow
opted-in
Federated learning
data device
Need me?
Federated learning
data device
Need me?
Not now
data device
Need me?
Yes!
Federated learning
initial model
Federated learning
data device updated
model
data device
Federated learning
initial model
(ephemeral)
updated
model
data device
Federated learning
initial model
updated
model
Privacy principle
Focused collection
Devices report only what is
needed for this computation
data device
Federated learning
initial model
updated
model
Privacy principle
Ephemeral reports
Server never persists
per-device reports
Stronger with SMPC
data device
combined
model
∑
Federated learning
initial model
updated
model
data device
combined
model
∑
Federated learning
initial model
updated
model
Privacy principle
Only-in-aggregate
Engineer may only
access combined
device reports
(another)
initial model
data device
combined
model
Federated learning
(another)
combined
model
(another)
initial model
∑
Federated learning
data device
data device
(another)
combined
model
(another)
initial model
∑
Typical orders-of-magnitude
100-1000s of users per round
100-1000s of rounds to convergence
1-10 minutes per round
Federated learning
Characteristics of federated learning
vs. traditional distributed learning
Data locality and distribution
● massively decentralized, naturally arising
(non-IID) partition
● Data is siloed, held by a small number of
coordinating entities
● system-controlled (e.g. shuffled, balanced)
Data availability
● limited availability, time-of-day variations
● almost all data nodes always available
Addressability
● data nodes are anonymous and interchangeable
● data nodes are addressable
Node statefulness
● stateless (generally no repeat computation)
● stateful
Node reliability
● unreliable (~10% failures)
● reliable
Wide-area communication pattern
● hub-and-spoke topology
● peer-to-peer topology (fully decentralized)
● none (centralized to one datacenter)
Distribution scale
● massively parallel (1e9 data nodes)
● single datacenter
Primary bottleneck
● communication
● computation
∑
∑
Constraints in the
federated setting
H. Eichner, et al. Semi-Cyclic Stochastic Gradient Descent. ICML 2019.
Each device reflects one users' data.
So no one device is representative of
the whole population.
Devices must idle, plugged-in, on wi-fi
to participate.
Device availability correlates with both
geo location and data distribution.
Round completion rate by hour (US)
Hour (over three days)
data device
(another)
combined
model
(another)
initial model
∑
Federated Averaging
Algorithm
Devices run multiple
steps of SGD on their
local data to compute
an update.
Server computes
overall update using
a simple weighted
average.
McMahan, et al. Communication-Efficient Learning of Deep
Networks from Decentralized Data. AISTATS 2017.
Server
Until Converged:
1. Select a random subset of clients
2. In parallel, send current parameters θt
to those clients
3. θt+1
= θt
+ data-weighted average of client updates
Selected Client k
1. Receive θt
from server.
2. Run some number of minibatch SGD steps,
producing θ'
3. Return θ'-θt
to server.
H. B. McMahan, et al.
Communication-Efficient Learning of
Deep Networks from Decentralized
Data. AISTATS 2017
The Federated Averaging algorithm
θt
θ'
Rounds to reach 10.5% Accuracy
23x decrease in
communication
rounds
H. B. McMahan, et al.
Communication-Efficient Learning of
Deep Networks from Decentralized
Data. AISTATS 2017
Large-scale LSTM for next-word prediction
Dataset: Large Social Network, 10m public posts, grouped by author.
Updates to reach 82%
49x
(IID and balanced data)
decrease in
communication
(updates) vs SGD
H. B. McMahan, et al.
Communication-Efficient Learning of
Deep Networks from Decentralized
Data. AISTATS 2017
CIFAR-10 convolutional model
Federated learning at Google
Federated learning at Google
0.5B installs
Daily use by multiple teams, dozens of ML engineers
Powering features in Gboard and on Pixel Devices
Federated Learning on Pixel Phones
Gboard: mobile keyboard
● Predict the next word based on typed text so far
● Powers the predictions strip
Gboard: language modeling
When should you consider federated learning?
● On-device data is more relevant than
server-side proxy data
● On-device data is privacy sensitive or large
● Labels can be inferred naturally from user
interaction
Gboard: language modeling
Federated model
compared to baseline A. Hard, et al. Federated Learning for
Mobile Keyboard Prediction.
arXiv:1811.03604
Gboard: language modeling
Federated RNN (compared to prior n-gram model):
● Better next-word prediction accuracy: +24%
● More useful prediction strip: +10% more clicks
Other federated models in Gboard
Emoji prediction
● 7% more accurate emoji predictions
● prediction strip clicks +4% more
● 11% more users share emojis!
Action prediction
When is it useful to suggest a gif, sticker, or
search query?
● 47% reduction in unhelpful suggestions
● increasing overall emoji, gif, and sticker
shares
Discovering new words
Federated discovery of what words people
are typing that Gboard doesn’t know.
T. Yang, et al. Applied Federated
Learning: Improving Google Keyboard
Query Suggestions. arXiv:1812.02903
M. Chen, et al. Federated Learning
Of Out-Of-Vocabulary Words.
arXiv:1903.10635
Ramaswamy, et al. Federated Learning
for Emoji Prediction in a Mobile
Keyboard. arXiv:1906.04329.
Google’s federated system
Towards Federated Learning at Scale: System Design
ai.google/research/pubs/pub47976
K. Bonawitz, et al. Towards
Federated Learning at Scale:
System Design. SysML 2019.
Federated learning beyond Google
Federated Learning Research
FL Workshop in Seattle 6/17-18
Federated learning workshops:
● NeurIPS 2019 (upcoming) Workshop on Federated
Learning for Data Privacy and Confidentiality
Submission deadline Sept. 9
● IJCAI 2019 Workshop on Federated Machine Learning for
User Privacy and Data Confidentiality (FML’19)
● Google-hosted June 2019
~40 faculty, 20 students, 40 Googlers
TensorFlow Federated
Experiment with federated technologies
in simulation
TensorFlow Federated
What’s in the box?
Federated Learning (FL) API
● Implementations of federated training/evaluation
● Can be immediately applied to existing TF models/data
Federated Core (FC) API
● Lower-level building blocks for expressing new federated algorithms
Local runtime for simulations
68.0
70.5
69.8
70.1
SERVER
69.5
Federated computation in TFF
the “placement”
type of local items
on each client
has type
68.0
70.5
69.8
70.1
SERVER
69.5
Federated computation in TFF
federated “op” can be interpreted as a function even
though its inputs and outputs are in different places
→
68.0
70.5
69.8
70.1
SERVER
69.5
Federated computation in TFF
federated “op” can be interpreted as a function even
though its inputs and outputs are in different places
→
it represents an abstract specification of
a distributed communication protocol
Federated computation in TFF
50
output: fraction of sensors
with readings > threshold
(an input)
(an input)
1
0
1`
>
to_float
>
>
THRESHOLD_TYPE = tff.FederatedType(tf.float32, tff.SERVER)
@tff.federated_computation(READINGS_TYPE, THRESHOLD_TYPE)
def get_fraction_over_threshold(readings, threshold):
@tff.tf_computation(tf.float32, tf.float32)
def _is_over_as_float(val, threshold):
return tf.to_float(val > threshold)
return tff.federated_mean(
tff.federated_map(_is_over_as_float, [
readings, tff.federated_broadcast(threshold)]))
For more TFF tutorials visit...
tensorflow.org/federated
Federated learning and privacy
ML on Sensitive Data: Privacy versus Utility
Privacy
Utility
ML on Sensitive Data: Privacy versus Utility
Privacy
Utility
perception
today
Privacy
Utility
1. Policy
2. Technology
ML on Sensitive Data: Privacy versus Utility
perception
Privacy
Utility
goal 1. Policy
2. New Technology
ML on Sensitive Data: Privacy versus Utility (?)
Push the pareto frontier
with better technology.
today
federated
training
model
deployment
Improving privacy
federated
training
What can the
network see?
What can the
server see?
What can the
device see?
Improving privacy
What can the
engineer see?
federated
training
Improving privacy
What can the server
admin see?
model
deployment
What can the
world see?
federated
training
Improving privacy
federated
training
model
deployment
What can the
world see?
What can the
network see?
What can the
server see?
What can the
device see?
What can the
engineer see?
Improving privacy
What can the server
admin see?
federated
training
model
deployment
What can the
world see?
What can the
network see?
What can the
server see?
What can the
device see?
What can the
engineer see?
Improving privacy
Ephemeral reports,
focused collection
Encryption at rest
and on the wire.
only-in-aggregate
release
What can the server
admin see?
federated
training
model
deployment
What can the
server see?
Improving privacy
Privacy Technology: Secure aggregation
Secure aggregation
Compute a (vector) sum of encrypted device reports
A practical protocol with
● Security guarantees
● Communication efficiency
● Dropout tolerance
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
Secure
aggregation
Privacy Technology: Secure aggregation
Secure exchange of
data-masking pairs
= 0
+
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
Secure
aggregation
Privacy Technology: Secure aggregation
All devices in a round
exchange pairs
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
Secure
aggregation
Privacy Technology: Secure aggregation
Device report
+ +
+
+
+
+
Report vectors are
completely masked
by random values
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
Secure
aggregation
Pairs cancel out when aggregated
together, revealing only the sum
Privacy Technology: Secure aggregation
+
+
+
+
+
+
+ +
+
+
+
+
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
Secure
aggregation
Pairs cancel out when aggregated
together, revealing only the sum
Privacy Technology: Secure aggregation
+
+
+
+
+
+
+ +
+
+
+
+
K. Bonawitz, et al. Practical Secure Aggregation for
Privacy-Preserving Machine Learning. CCS 2017.
federated
training
model
deployment
What can the
world see?
What can the
engineer see?
Improving privacy
Gboard: language modeling
● Better next-word prediction accuracy: +24%
● More useful prediction strip: +10% more clicks
Gboard: language modeling
Can a language model be too
good?
Can a language model be too
good?
Gboard: language modeling
4012 8888 8888 1881
Brendan McMahan's credit
card number is ….
Can a language model be too
good?
Gboard: language modeling
4012 8888 8888 1881
Brendan McMahan's credit
card number is ….
Privacy Principle:
Don’t memorize
individuals’ data
Differential privacy can
provide formal bounds
Differential privacy
Dwork and Roth. The Algorithmic
Foundations of Differential Privacy. 2014.
Statistical science of learning common patterns in a
dataset without memorizing individual examples
Differential privacy
complements
federated learning
Use noise to obscure an
individual’s impact on the
learned model.
github.com/tensorflow/privacy
Privacy Principle #4:
Don’t memorize
individuals’ data
data device
updated
model
model
+
∑
1. Devices “clip” their updates, limiting any one
user's contribution
2. Server adds noise when combining updates
H. B. McMahan, et al. Learning Differentially Private
Recurrent Language Models. ICLR 2018.
Privacy Technology: Differentially Private Model Averaging
Open problems
Motivation from real-world constraints
Data locality and distribution
● massively decentralized, naturally arising
(non-IID) partition
● Data is siloed, held by a small number of
coordinating entities
● system-controlled (e.g. shuffled, balanced)
Data availability
● limited availability, time-of-day variations
● almost all data nodes always available
Addressability
● data nodes are anonymous and interchangeable
● data nodes are addressable
Node statefulness
● stateless (generally no repeat computation)
● stateful
Node reliability
● unreliable (~10% failures)
● reliable
Wide-area communication pattern
● hub-and-spoke topology
● peer-to-peer topology (fully decentralized)
● none (centralized to one datacenter)
Distribution scale
● massively parallel (1e9 data nodes)
● single datacenter
Primary bottleneck
● communication
● computation
Sample open problem areas
● Optimization algorithms for FL, particularly
communication-efficient algorithms tolerant
of non-IID data
● Approaches that scale FL to larger models,
including model and gradient compression
techniques
● Novel applications of FL, extension to new
learning algorithms and model classes.
● Theory for FL
● Enhancing the security and privacy of FL,
including cryptographic techniques and
differential privacy
● Bias and fairness in the FL setting (new
possibilities and new challenges)
● Attacks on FL including model poisoning,
and corresponding defenses
● Not everyone has to have the same model
(multi-task and pluralistic learning,
personalization, domain adaptation)
● Generative models, transfer learning,
semi-supervised learning
Finally ...
g.co/federated

More Related Content

Similar to 2019-09-05Federated Learning.pdf

Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesAlAtfat
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
Hac IT 4. Emerging Technologies (1).pdf
Hac IT 4. Emerging Technologies  (1).pdfHac IT 4. Emerging Technologies  (1).pdf
Hac IT 4. Emerging Technologies (1).pdfAAFREEN SHAIKH
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentMohammed Adam
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 
Introduction to roof computing by Nishant Krishna
Introduction to roof computing by Nishant KrishnaIntroduction to roof computing by Nishant Krishna
Introduction to roof computing by Nishant KrishnaCodeOps Technologies LLP
 
Federated learning
Federated learningFederated learning
Federated learningMindos Cheng
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Mark Goldstein
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Ron Bodkin
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012Charith Perera
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputingpchengi
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
 

Similar to 2019-09-05Federated Learning.pdf (20)

Federated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devicesFederated learning and its role in the privacy preservation of IoT devices
Federated learning and its role in the privacy preservation of IoT devices
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Hac IT 4. Emerging Technologies (1).pdf
Hac IT 4. Emerging Technologies  (1).pdfHac IT 4. Emerging Technologies  (1).pdf
Hac IT 4. Emerging Technologies (1).pdf
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile Agent
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
Grid computing
Grid computingGrid computing
Grid computing
 
Introduction to roof computing by Nishant Krishna
Introduction to roof computing by Nishant KrishnaIntroduction to roof computing by Nishant Krishna
Introduction to roof computing by Nishant Krishna
 
Federated learning
Federated learningFederated learning
Federated learning
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputing
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 

2019-09-05Federated Learning.pdf

  • 1. Federated Learning, from Research to Practice Brendan McMahan Presenting the work of many CMU 2019.09.05 g.co/federated
  • 2. Federated learning Enable machine learning engineers and data scientists to work productively with decentralized data with privacy by default
  • 3. Outline 1. Why do we need federated learning? 2. What is federated learning? 3. Federated learning at Google 4. Federated learning beyond Google 5. Federated learning and privacy 6. Open problems
  • 5. Data is born at the edge Billions of phones & IoT devices constantly generate data Data enables better products and smarter models
  • 6. Data processing is moving on device: ● Improved latency ● Works offline ● Better battery life ● Privacy advantages E.g., on-device inference for mobile keyboards and cameras. Can data live at the edge?
  • 7. Data processing is moving on device: ● Improved latency ● Works offline ● Better battery life ● Privacy advantages E.g., on-device inference for mobile keyboards and cameras. Can data live at the edge? What about analytics? What about learning?
  • 8. 2014: Three choices Don’t use data to improve products and services Log the data centrally anyway Invent a new solution Choose your poison 🕱
  • 9. 2014: Three choices Don’t use data to improve products and services Log the data centrally anyway Invent a new solution Choose your poison 🕱
  • 10. 2019: Good reason to hope Don’t use data to improve products and services Log the data centrally anyway Federated learning and analytics
  • 11. What is federated learning?
  • 12. cloud data Train & evaluate on cloud data Model engineer workflow
  • 15. Train & evaluate on decentralized data Model engineer workflow
  • 20. data device Federated learning initial model (ephemeral) updated model
  • 21. data device Federated learning initial model updated model Privacy principle Focused collection Devices report only what is needed for this computation
  • 22. data device Federated learning initial model updated model Privacy principle Ephemeral reports Server never persists per-device reports Stronger with SMPC
  • 24. data device combined model ∑ Federated learning initial model updated model Privacy principle Only-in-aggregate Engineer may only access combined device reports
  • 27. data device (another) combined model (another) initial model ∑ Typical orders-of-magnitude 100-1000s of users per round 100-1000s of rounds to convergence 1-10 minutes per round Federated learning
  • 28. Characteristics of federated learning vs. traditional distributed learning Data locality and distribution ● massively decentralized, naturally arising (non-IID) partition ● Data is siloed, held by a small number of coordinating entities ● system-controlled (e.g. shuffled, balanced) Data availability ● limited availability, time-of-day variations ● almost all data nodes always available Addressability ● data nodes are anonymous and interchangeable ● data nodes are addressable Node statefulness ● stateless (generally no repeat computation) ● stateful Node reliability ● unreliable (~10% failures) ● reliable Wide-area communication pattern ● hub-and-spoke topology ● peer-to-peer topology (fully decentralized) ● none (centralized to one datacenter) Distribution scale ● massively parallel (1e9 data nodes) ● single datacenter Primary bottleneck ● communication ● computation
  • 29. ∑ ∑ Constraints in the federated setting H. Eichner, et al. Semi-Cyclic Stochastic Gradient Descent. ICML 2019. Each device reflects one users' data. So no one device is representative of the whole population. Devices must idle, plugged-in, on wi-fi to participate. Device availability correlates with both geo location and data distribution. Round completion rate by hour (US) Hour (over three days)
  • 30. data device (another) combined model (another) initial model ∑ Federated Averaging Algorithm Devices run multiple steps of SGD on their local data to compute an update. Server computes overall update using a simple weighted average. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017.
  • 31. Server Until Converged: 1. Select a random subset of clients 2. In parallel, send current parameters θt to those clients 3. θt+1 = θt + data-weighted average of client updates Selected Client k 1. Receive θt from server. 2. Run some number of minibatch SGD steps, producing θ' 3. Return θ'-θt to server. H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017 The Federated Averaging algorithm θt θ'
  • 32. Rounds to reach 10.5% Accuracy 23x decrease in communication rounds H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017 Large-scale LSTM for next-word prediction Dataset: Large Social Network, 10m public posts, grouped by author.
  • 33. Updates to reach 82% 49x (IID and balanced data) decrease in communication (updates) vs SGD H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017 CIFAR-10 convolutional model
  • 35. Federated learning at Google 0.5B installs Daily use by multiple teams, dozens of ML engineers Powering features in Gboard and on Pixel Devices
  • 36. Federated Learning on Pixel Phones
  • 38. ● Predict the next word based on typed text so far ● Powers the predictions strip Gboard: language modeling When should you consider federated learning? ● On-device data is more relevant than server-side proxy data ● On-device data is privacy sensitive or large ● Labels can be inferred naturally from user interaction
  • 39. Gboard: language modeling Federated model compared to baseline A. Hard, et al. Federated Learning for Mobile Keyboard Prediction. arXiv:1811.03604
  • 40. Gboard: language modeling Federated RNN (compared to prior n-gram model): ● Better next-word prediction accuracy: +24% ● More useful prediction strip: +10% more clicks
  • 41. Other federated models in Gboard Emoji prediction ● 7% more accurate emoji predictions ● prediction strip clicks +4% more ● 11% more users share emojis! Action prediction When is it useful to suggest a gif, sticker, or search query? ● 47% reduction in unhelpful suggestions ● increasing overall emoji, gif, and sticker shares Discovering new words Federated discovery of what words people are typing that Gboard doesn’t know. T. Yang, et al. Applied Federated Learning: Improving Google Keyboard Query Suggestions. arXiv:1812.02903 M. Chen, et al. Federated Learning Of Out-Of-Vocabulary Words. arXiv:1903.10635 Ramaswamy, et al. Federated Learning for Emoji Prediction in a Mobile Keyboard. arXiv:1906.04329.
  • 42. Google’s federated system Towards Federated Learning at Scale: System Design ai.google/research/pubs/pub47976 K. Bonawitz, et al. Towards Federated Learning at Scale: System Design. SysML 2019.
  • 44. Federated Learning Research FL Workshop in Seattle 6/17-18 Federated learning workshops: ● NeurIPS 2019 (upcoming) Workshop on Federated Learning for Data Privacy and Confidentiality Submission deadline Sept. 9 ● IJCAI 2019 Workshop on Federated Machine Learning for User Privacy and Data Confidentiality (FML’19) ● Google-hosted June 2019 ~40 faculty, 20 students, 40 Googlers
  • 45. TensorFlow Federated Experiment with federated technologies in simulation
  • 46. TensorFlow Federated What’s in the box? Federated Learning (FL) API ● Implementations of federated training/evaluation ● Can be immediately applied to existing TF models/data Federated Core (FC) API ● Lower-level building blocks for expressing new federated algorithms Local runtime for simulations
  • 47. 68.0 70.5 69.8 70.1 SERVER 69.5 Federated computation in TFF the “placement” type of local items on each client has type
  • 48. 68.0 70.5 69.8 70.1 SERVER 69.5 Federated computation in TFF federated “op” can be interpreted as a function even though its inputs and outputs are in different places →
  • 49. 68.0 70.5 69.8 70.1 SERVER 69.5 Federated computation in TFF federated “op” can be interpreted as a function even though its inputs and outputs are in different places → it represents an abstract specification of a distributed communication protocol
  • 50. Federated computation in TFF 50 output: fraction of sensors with readings > threshold (an input) (an input) 1 0 1` > to_float > >
  • 51. THRESHOLD_TYPE = tff.FederatedType(tf.float32, tff.SERVER) @tff.federated_computation(READINGS_TYPE, THRESHOLD_TYPE) def get_fraction_over_threshold(readings, threshold): @tff.tf_computation(tf.float32, tf.float32) def _is_over_as_float(val, threshold): return tf.to_float(val > threshold) return tff.federated_mean( tff.federated_map(_is_over_as_float, [ readings, tff.federated_broadcast(threshold)]))
  • 52. For more TFF tutorials visit... tensorflow.org/federated
  • 54. ML on Sensitive Data: Privacy versus Utility Privacy Utility
  • 55. ML on Sensitive Data: Privacy versus Utility Privacy Utility perception
  • 56. today Privacy Utility 1. Policy 2. Technology ML on Sensitive Data: Privacy versus Utility perception
  • 57. Privacy Utility goal 1. Policy 2. New Technology ML on Sensitive Data: Privacy versus Utility (?) Push the pareto frontier with better technology. today
  • 59. federated training What can the network see? What can the server see? What can the device see? Improving privacy
  • 60. What can the engineer see? federated training Improving privacy What can the server admin see?
  • 61. model deployment What can the world see? federated training Improving privacy
  • 62. federated training model deployment What can the world see? What can the network see? What can the server see? What can the device see? What can the engineer see? Improving privacy What can the server admin see?
  • 63. federated training model deployment What can the world see? What can the network see? What can the server see? What can the device see? What can the engineer see? Improving privacy Ephemeral reports, focused collection Encryption at rest and on the wire. only-in-aggregate release What can the server admin see?
  • 65. Privacy Technology: Secure aggregation Secure aggregation Compute a (vector) sum of encrypted device reports A practical protocol with ● Security guarantees ● Communication efficiency ● Dropout tolerance K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 66. Secure aggregation Privacy Technology: Secure aggregation Secure exchange of data-masking pairs = 0 + K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 67. Secure aggregation Privacy Technology: Secure aggregation All devices in a round exchange pairs K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 68. Secure aggregation Privacy Technology: Secure aggregation Device report + + + + + + Report vectors are completely masked by random values K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 69. Secure aggregation Pairs cancel out when aggregated together, revealing only the sum Privacy Technology: Secure aggregation + + + + + + + + + + + + K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 70. Secure aggregation Pairs cancel out when aggregated together, revealing only the sum Privacy Technology: Secure aggregation + + + + + + + + + + + + K. Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.
  • 71. federated training model deployment What can the world see? What can the engineer see? Improving privacy
  • 72. Gboard: language modeling ● Better next-word prediction accuracy: +24% ● More useful prediction strip: +10% more clicks
  • 73. Gboard: language modeling Can a language model be too good?
  • 74. Can a language model be too good? Gboard: language modeling 4012 8888 8888 1881 Brendan McMahan's credit card number is ….
  • 75. Can a language model be too good? Gboard: language modeling 4012 8888 8888 1881 Brendan McMahan's credit card number is …. Privacy Principle: Don’t memorize individuals’ data Differential privacy can provide formal bounds
  • 76. Differential privacy Dwork and Roth. The Algorithmic Foundations of Differential Privacy. 2014. Statistical science of learning common patterns in a dataset without memorizing individual examples
  • 77. Differential privacy complements federated learning Use noise to obscure an individual’s impact on the learned model. github.com/tensorflow/privacy Privacy Principle #4: Don’t memorize individuals’ data
  • 78. data device updated model model + ∑ 1. Devices “clip” their updates, limiting any one user's contribution 2. Server adds noise when combining updates H. B. McMahan, et al. Learning Differentially Private Recurrent Language Models. ICLR 2018. Privacy Technology: Differentially Private Model Averaging
  • 80. Motivation from real-world constraints Data locality and distribution ● massively decentralized, naturally arising (non-IID) partition ● Data is siloed, held by a small number of coordinating entities ● system-controlled (e.g. shuffled, balanced) Data availability ● limited availability, time-of-day variations ● almost all data nodes always available Addressability ● data nodes are anonymous and interchangeable ● data nodes are addressable Node statefulness ● stateless (generally no repeat computation) ● stateful Node reliability ● unreliable (~10% failures) ● reliable Wide-area communication pattern ● hub-and-spoke topology ● peer-to-peer topology (fully decentralized) ● none (centralized to one datacenter) Distribution scale ● massively parallel (1e9 data nodes) ● single datacenter Primary bottleneck ● communication ● computation
  • 81. Sample open problem areas ● Optimization algorithms for FL, particularly communication-efficient algorithms tolerant of non-IID data ● Approaches that scale FL to larger models, including model and gradient compression techniques ● Novel applications of FL, extension to new learning algorithms and model classes. ● Theory for FL ● Enhancing the security and privacy of FL, including cryptographic techniques and differential privacy ● Bias and fairness in the FL setting (new possibilities and new challenges) ● Attacks on FL including model poisoning, and corresponding defenses ● Not everyone has to have the same model (multi-task and pluralistic learning, personalization, domain adaptation) ● Generative models, transfer learning, semi-supervised learning