Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Federated Learning

200 views

Published on

Federated Learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy, reduces network communication costs, and taps edge device computing resources. The principles of data minimization established by the GDPR, and the growing prevalence of smart sensors make the advantages of federated learning more compelling. Federated learning is a great fit for smartphones, industrial and consumer IoT, healthcare and other privacy-sensitive use cases, and industrial sensor applications.

We’ll present the Fast Forward Labs team’s research on this topic and the accompanying prototype application, “Turbofan Tycoon”: a simplified working example of federated learning applied to a predictive maintenance problem. In this demo scenario, customers of an industrial turbofan manufacturer are not willing to share the details of how their components failed with the manufacturer, but want the manufacturer to provide them with a strategy to maintain the part. Federated learning allows us to satisfy the customer's privacy concerns while providing them with a model that leads to fewer costly failures and less maintenance downtime.

We’ll discuss the advantages and tradeoffs of taking the federated approach. We’ll assess the state of tooling for federated learning, circumstances in which you might want to consider applying it, and the challenges you’d face along the way.

Speaker
Chris Wallace
Data Scientist
Cloudera

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Federated Learning

  1. 1. © Cloudera, Inc. All rights reserved. FEDERATED LEARNING Chris J Wallace • Data Scientist • Cloudera Fast Forward Labs @_cjwallace
  2. 2. Available to Fast Forward Labs clients Play at turbofan.fastforwardlabs.com
  3. 3. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TODAY ● WHY CARE? ● FEDERATED AVERAGING ● PROTOTYPE ● CHALLENGES ● TOOLS
  4. 4. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. WHY CARE ABOUT FEDERATED LEARNING? 7
  5. 5. © Cloudera, Inc. All rights reserved. 8
  6. 6. © Cloudera, Inc. All rights reserved. 9
  7. 7. © Cloudera, Inc. All rights reserved. 1 0
  8. 8. © Cloudera, Inc. All rights reserved. 1 1
  9. 9. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. REQUIREMENTS FOR FEDERATED LEARNING ● Performance improves with more data. ● Models can be meaningfully combined. ● Nodes can train models, not only predict. amount of data performance
  10. 10. © Cloudera, Inc. All rights reserved. 1 3
  11. 11. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016 1 4
  12. 12. © Cloudera, Inc. All rights reserved. A network of nodes shares models rather than training data with the server
  13. 13. © Cloudera, Inc. All rights reserved. 16 The server has an untrained model
  14. 14. © Cloudera, Inc. All rights reserved. 17 It sends a copy of that model to the nodes
  15. 15. © Cloudera, Inc. All rights reserved. 18 The nodes now also have the untrained model
  16. 16. © Cloudera, Inc. All rights reserved. 19 The nodes have data on which to train their model
  17. 17. © Cloudera, Inc. All rights reserved. 20 Each node trains the model to fit the data they have
  18. 18. © Cloudera, Inc. All rights reserved. 21 Each node sends a copy of its trained model back to the server
  19. 19. © Cloudera, Inc. All rights reserved. 22 The server combines these models by taking an average We repeat the whole process many times.
  20. 20. © Cloudera, Inc. All rights reserved. 23 The server now has a model that captures the patterns in the training data on all the nodes But at no point did the nodes share their training data which increases privacy and saves on bandwidth.
  21. 21. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. FEDERATED AVERAGING CAN HANDLE... ● Non-IID data ○ Training data on each node can be idiosyncratic. ● Unbalanced data ○ Unequal amount of data on each node. ● Massively distributed data ○ Can have many more devices than training examples per node. ● Limited communication ○ Cannot guarantee availability of nodes. Communication-Efficient Learning of Deep Networks from Decentralized Data McMahan et al. 2016
  22. 22. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TURBOFAN TYCOON 25 turbofan.fastforwardlabs.com
  23. 23. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Corrective Preventative Predictive PREDICTIVE MAINTENANCE
  24. 24. © Cloudera, Inc. All rights reserved. turbofan.fastforwardlabs.com CMAPPS data set https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
  25. 25. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. CHALLENGES 28
  26. 26. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 29 Power consumption Dropped connections Stragglers SYSTEMS ISSUES
  27. 27. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 30 PRIVACY Adversary can’t inspect data, but can inspect model.
  28. 28. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. 31 Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning, Hitaj et al. (2017) PRIVACY
  29. 29. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TOOLS 32
  30. 30. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. OpenMined https://www.openmined.org/ “OpenMined is an open-source community focused on researching, developing, and promoting tools for secure, privacy-preserving, value-aligned artificial intelligence.” ● More than federated learning. ● PySyft is a library for privacy preserving deep learning. ● Grid is a peer-to-peer platform for decentralized data science.
  31. 31. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. TensorFlow Federated https://www.tensorflow.org/federated ● Federated Learning API ○ Wrap TensorFlow models in included FL implementations. ○ High level, with attention paid to separating the concerns of models, communication, and so on. ● Federated Core API ○ Low level interfaces for building novel FL algorithms. Local simulation runtime only right now. New, but promising.
  32. 32. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. SUMMARY
  33. 33. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Federated Learning is machine learning on decentralized data
  34. 34. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. ● Privacy is needed (FL not the whole solution) ● Bandwidth or power consumption are concerns ● High cost of data transfer ● Your model improves with more data YOU MIGHT HAVE A USE CASE IF … 37
  35. 35. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. ● Predictive maintenance/industrial IOT ● Smartphones ● Healthcare (wearables, drug discovery, prognostics, etc.) ● Enterprise/corporate IT (chat, issue trackers, email, etc.) EXAMPLES 38
  36. 36. © Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved. Cloudera Fast Forward Labs • An introduction to Federated Learning (Cloudera VISION blog, business audience) • Federated learning: distributed machine learning with data locality and privacy (FFL blog, more technical) • Turbofan Tycoon (working prototype, see FFL blog post for some details) Other blog posts • Collaborative Machine Learning without Centralized Training Data (Google research blog) • Federated Learning for Firefox (Firefox on florian.github.io) • Federated Learning for wake word detection (snips.ai on medium.com) Papers • Communication-Efficient Learning of Deep Networks from Decentralized Data by McMahan et al. (Google, 2016) • Practical Secure Aggregation for Privacy-Preserving Machine Learning by Bonawitz et al. (Google, 2017) • Federated Multi-Task Learning by Smith et al. (2017) • A generic framework for privacy preserving deep learning by Ryffel et al. (2018, and see also github.com/OpenMined/PySyft) • Federated Learning for Mobile Keyboard Prediction by Hard et al. (Google, 2018) 39
  37. 37. © Cloudera, Inc. All rights reserved. THANK YOU cffl@cloudera.com @_cjwallace

×