© Cloudera, Inc. All rights reserved.
FEDERATED LEARNING
Chris J Wallace • Data Scientist • Cloudera Fast Forward Labs
@_cjwallace
Available to
Fast Forward Labs
clients
Play at turbofan.fastforwardlabs.com
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TODAY
● WHY CARE?
● FEDERATED AVERAGING
● PROTOTYPE
● CHALLENGES
● TOOLS
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
WHY CARE ABOUT FEDERATED LEARNING?
7
© Cloudera, Inc. All rights reserved. 8
© Cloudera, Inc. All rights reserved. 9
© Cloudera, Inc. All rights reserved.
1
0
© Cloudera, Inc. All rights reserved.
1
1
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
REQUIREMENTS FOR FEDERATED LEARNING
● Performance improves with more data.
● Models can be meaningfully combined.
● Nodes can train models, not only predict.
amount of data
performance
© Cloudera, Inc. All rights reserved.
1
3
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
FEDERATED AVERAGING
Communication-Efficient Learning of Deep Networks from Decentralized Data
McMahan et al. 2016
1
4
© Cloudera, Inc. All rights reserved.
A network of nodes shares models rather than training data with the server
© Cloudera, Inc. All rights reserved. 16
The server has an untrained model
© Cloudera, Inc. All rights reserved. 17
It sends a copy of that model to the nodes
© Cloudera, Inc. All rights reserved. 18
The nodes now also have the untrained model
© Cloudera, Inc. All rights reserved. 19
The nodes have data on which to train their model
© Cloudera, Inc. All rights reserved. 20
Each node trains the model to fit the data they have
© Cloudera, Inc. All rights reserved. 21
Each node sends a copy of its trained model back to the
server
© Cloudera, Inc. All rights reserved. 22
The server combines these models by taking an average
We repeat the whole process many times.
© Cloudera, Inc. All rights reserved. 23
The server now has a model that captures
the patterns in the training data on all the nodes
But at no point did the nodes share their training data
which increases privacy and saves on bandwidth.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
FEDERATED AVERAGING CAN HANDLE...
● Non-IID data
○ Training data on each node can be idiosyncratic.
● Unbalanced data
○ Unequal amount of data on each node.
● Massively distributed data
○ Can have many more devices than training examples per node.
● Limited communication
○ Cannot guarantee availability of nodes. Communication-Efficient Learning of Deep Networks from Decentralized Data
McMahan et al. 2016
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TURBOFAN TYCOON
25
turbofan.fastforwardlabs.com
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Corrective Preventative Predictive
PREDICTIVE MAINTENANCE
© Cloudera, Inc. All rights reserved.
turbofan.fastforwardlabs.com
CMAPPS data set
https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
CHALLENGES
28
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 29
Power consumption Dropped connections Stragglers
SYSTEMS ISSUES
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 30
PRIVACY
Adversary can’t inspect data, but can inspect model.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 31
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning, Hitaj et al. (2017)
PRIVACY
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TOOLS
32
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
OpenMined
https://www.openmined.org/
“OpenMined is an open-source community focused on
researching, developing, and promoting tools for secure,
privacy-preserving, value-aligned artificial intelligence.”
● More than federated learning.
● PySyft is a library for privacy preserving deep learning.
● Grid is a peer-to-peer platform for decentralized data
science.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
TensorFlow Federated
https://www.tensorflow.org/federated
● Federated Learning API
○ Wrap TensorFlow models in included FL
implementations.
○ High level, with attention paid to separating the
concerns of models, communication, and so on.
● Federated Core API
○ Low level interfaces for building novel FL algorithms.
Local simulation runtime only right now.
New, but promising.
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
SUMMARY
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Federated Learning is machine learning on
decentralized data
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
● Privacy is needed (FL not the whole solution)
● Bandwidth or power consumption are concerns
● High cost of data transfer
● Your model improves with more data
YOU MIGHT HAVE A USE CASE IF …
37
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
● Predictive maintenance/industrial IOT
● Smartphones
● Healthcare (wearables, drug discovery, prognostics, etc.)
● Enterprise/corporate IT (chat, issue trackers, email, etc.)
EXAMPLES
38
© Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
Cloudera Fast Forward Labs
• An introduction to Federated Learning (Cloudera VISION blog, business audience)
• Federated learning: distributed machine learning with data locality and privacy (FFL blog, more technical)
• Turbofan Tycoon (working prototype, see FFL blog post for some details)
Other blog posts
• Collaborative Machine Learning without Centralized Training Data (Google research blog)
• Federated Learning for Firefox (Firefox on florian.github.io)
• Federated Learning for wake word detection (snips.ai on medium.com)
Papers
• Communication-Efficient Learning of Deep Networks from Decentralized Data by McMahan et al. (Google, 2016)
• Practical Secure Aggregation for Privacy-Preserving Machine Learning by Bonawitz et al. (Google, 2017)
• Federated Multi-Task Learning by Smith et al. (2017)
• A generic framework for privacy preserving deep learning by Ryffel et al. (2018, and see also github.com/OpenMined/PySyft)
• Federated Learning for Mobile Keyboard Prediction by Hard et al. (Google, 2018)
39
© Cloudera, Inc. All rights reserved.
THANK YOU
cffl@cloudera.com
@_cjwallace

Federated Learning

  • 1.
    © Cloudera, Inc.All rights reserved. FEDERATED LEARNING Chris J Wallace • Data Scientist • Cloudera Fast Forward Labs @_cjwallace
  • 3.
    Available to Fast ForwardLabs clients Play at turbofan.fastforwardlabs.com
  • 6.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. TODAY ● WHY CARE? ● FEDERATED AVERAGING ● PROTOTYPE ● CHALLENGES ● TOOLS
  • 7.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. WHY CARE ABOUT FEDERATED LEARNING? 7
  • 8.
    © Cloudera, Inc.All rights reserved. 8
  • 9.
    © Cloudera, Inc.All rights reserved. 9
  • 10.
    © Cloudera, Inc.All rights reserved. 1 0
  • 11.
    © Cloudera, Inc.All rights reserved. 1 1
  • 12.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. REQUIREMENTS FOR FEDERATED LEARNING ● Performance improves with more data. ● Models can be meaningfully combined. ● Nodes can train models, not only predict. amount of data performance
  • 13.
    © Cloudera, Inc.All rights reserved. 1 3
  • 14.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016 1 4
  • 15.
    © Cloudera, Inc.All rights reserved. A network of nodes shares models rather than training data with the server
  • 16.
    © Cloudera, Inc.All rights reserved. 16 The server has an untrained model
  • 17.
    © Cloudera, Inc.All rights reserved. 17 It sends a copy of that model to the nodes
  • 18.
    © Cloudera, Inc.All rights reserved. 18 The nodes now also have the untrained model
  • 19.
    © Cloudera, Inc.All rights reserved. 19 The nodes have data on which to train their model
  • 20.
    © Cloudera, Inc.All rights reserved. 20 Each node trains the model to fit the data they have
  • 21.
    © Cloudera, Inc.All rights reserved. 21 Each node sends a copy of its trained model back to the server
  • 22.
    © Cloudera, Inc.All rights reserved. 22 The server combines these models by taking an average We repeat the whole process many times.
  • 23.
    © Cloudera, Inc.All rights reserved. 23 The server now has a model that captures the patterns in the training data on all the nodes But at no point did the nodes share their training data which increases privacy and saves on bandwidth.
  • 24.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING CAN HANDLE... ● Non-IID data ○ Training data on each node can be idiosyncratic. ● Unbalanced data ○ Unequal amount of data on each node. ● Massively distributed data ○ Can have many more devices than training examples per node. ● Limited communication ○ Cannot guarantee availability of nodes. Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016
  • 25.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. TURBOFAN TYCOON 25 turbofan.fastforwardlabs.com
  • 26.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. Corrective Preventative Predictive PREDICTIVE MAINTENANCE
  • 27.
    © Cloudera, Inc.All rights reserved. turbofan.fastforwardlabs.com CMAPPS data set https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
  • 28.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. CHALLENGES 28
  • 29.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. 29 Power consumption Dropped connections Stragglers SYSTEMS ISSUES
  • 30.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. 30 PRIVACY Adversary can’t inspect data, but can inspect model.
  • 31.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. 31 Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning, Hitaj et al. (2017) PRIVACY
  • 32.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. TOOLS 32
  • 33.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. OpenMined https://www.openmined.org/ “OpenMined is an open-source community focused on researching, developing, and promoting tools for secure, privacy-preserving, value-aligned artificial intelligence.” ● More than federated learning. ● PySyft is a library for privacy preserving deep learning. ● Grid is a peer-to-peer platform for decentralized data science.
  • 34.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. TensorFlow Federated https://www.tensorflow.org/federated ● Federated Learning API ○ Wrap TensorFlow models in included FL implementations. ○ High level, with attention paid to separating the concerns of models, communication, and so on. ● Federated Core API ○ Low level interfaces for building novel FL algorithms. Local simulation runtime only right now. New, but promising.
  • 35.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. SUMMARY
  • 36.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. Federated Learning is machine learning on decentralized data
  • 37.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. ● Privacy is needed (FL not the whole solution) ● Bandwidth or power consumption are concerns ● High cost of data transfer ● Your model improves with more data YOU MIGHT HAVE A USE CASE IF … 37
  • 38.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. ● Predictive maintenance/industrial IOT ● Smartphones ● Healthcare (wearables, drug discovery, prognostics, etc.) ● Enterprise/corporate IT (chat, issue trackers, email, etc.) EXAMPLES 38
  • 39.
    © Cloudera, Inc.All rights reserved.© Cloudera, Inc. All rights reserved. Cloudera Fast Forward Labs • An introduction to Federated Learning (Cloudera VISION blog, business audience) • Federated learning: distributed machine learning with data locality and privacy (FFL blog, more technical) • Turbofan Tycoon (working prototype, see FFL blog post for some details) Other blog posts • Collaborative Machine Learning without Centralized Training Data (Google research blog) • Federated Learning for Firefox (Firefox on florian.github.io) • Federated Learning for wake word detection (snips.ai on medium.com) Papers • Communication-Efficient Learning of Deep Networks from Decentralized Data by McMahan et al. (Google, 2016) • Practical Secure Aggregation for Privacy-Preserving Machine Learning by Bonawitz et al. (Google, 2017) • Federated Multi-Task Learning by Smith et al. (2017) • A generic framework for privacy preserving deep learning by Ryffel et al. (2018, and see also github.com/OpenMined/PySyft) • Federated Learning for Mobile Keyboard Prediction by Hard et al. (Google, 2018) 39
  • 40.
    © Cloudera, Inc.All rights reserved. THANK YOU cffl@cloudera.com @_cjwallace