Federated Machine Learning
Andreas Hellander
Co-founder and Lead Scientist, Scaleout Systems
Associate Professor in Scientific Computing, Uppsala University
scaleoutsystems.com it.uu.se
Main issues with the centralized paradigm
in machine learning:
● Private/Proprietary data — Sharing
valuable business data with someone
else is not an option.
● Regulated data — GDPR, HIPAA, etc.
● Practical blockers — data is too big,
the network connection is expensive,
slow or unreliable.
Also, large datasets relevant to AI
problems are controlled by a small number
of large organizations and there are no
great mechanisms for sharing that data
with the data science community.
scaleoutsystems.com
The data centralization problem
1. Collect and centralize data from
different sources (data lake, cloud).
2. Create ML model using centralised
data (cluster computing)
How can parties come together to create joint
ML models without sharing/pooling data?
Federated Machine Learning
Federated Machine Learning (FedML) is a
distributed machine learning approach
which enables training on decentralised
data.
● Train local machine learning model on
local/private data.
● Combine local model updates into a
global, federated model.
Federated learning addresses the
fundamental problems of centralized AI
such as privacy, ownership, and locality of
data.
scaleoutsystems.com/federated-machine-learning
The key benefit of FedML
Lets parties form alliances/networks to
build stronger models than what could be
attained by the parties in isolation.
● Data security and privacy where data
never moves.
● Powerful data network effects in
industries where data cannot be
transferred.
● Reduced data transfer costs when
data is very large or networks
unreliable.
scaleoutsystems.com/federated-machine-learning
N. Gauraha, O. Spjuth, A. Hellander (2019), manuscript in preparation
Early example
FedML on Gboard:
● Local model for search suggestion,
with context and whether suggestion
was clicked
● On device the history is processed,
and then only a model update is
suggested to Google
● Based on Federated Averaging, a
scheme to aggregate weights from
locally trained neural nets:
https://arxiv.org/pdf/1602.05629.pdf
scaleoutsystems.com/federated-machine-learning
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
Smart software on top of decentralized
infrastructure/instruments ● Let’s an instrument/software vendor
build smarter software.
● Digital pathology, medical dosimetry,
predictive maintenance etc.
● Sensitive data does not need to be
shared.
● Powerful network effects possible.
scaleoutsystems.com/federated-machine-learning
Federated
Model
Software services
Federated learning system
Infrastructure vendor
Integrity-preserving E-health
● Digital tools/video surveillance in
home care.
● Train and deploy models based on
homeowners’ private interactions
without collecting central data.
scaleoutsystems.com/federated-machine-learning
Privacy-preservation features of FedML
● Input privacy simplified since data do
not move (handled according to local
policies)
● Output privacy - depends on the
algorithm, how easy it is to invert the
model etc.
● What can be learned from the
coordination of computation?
○ Different for federated averaging
and ensemble methods
(algorithm dependent)
scaleoutsystems.com/federated-machine-learning
UN Handbook for Privacy-Preserving Techniques
Differential privacy & homomorphic encryption in FedML
Differential privacy: Add
carefully calibrated noise
(protects against inference
attacks)
Homomorphic encryption: Methods
work on encrypted data
Secure multiparty computation:
Aggregate/compute without a
third party trust
provider/server.
scaleoutsystems.com/federated-machine-learning
R&D challenges
Scalability and ML
performance
How do we (re)design
algorithms and frameworks
to scale out to the fog and
edge?
Decentralized computation
How can we do FedML
without a third-party trust
provider?
Adversarial ML
How can we make the
system robust to dishonest
members and external
threats?
FedML is a research area that integrates many differents areas of
computer science and mathematics.
scaleoutsystems.com/federated-machine-learning
Backdooring federated learning
● Big threat to a FedML comes from
within the alliance / from
compromised members.
● Large alliances can be expected to be
relatively robust to data poisoning
attacks.
● Bagdasaryan et al. shows how their
proposed approach of model
replacement can efficiently introduce
backdoors in a global model.
● Secure aggregation/MPC makes it
impossible to detect a malicious
model update, and who submitted it!
scaleoutsystems.com/federated-machine-learning
Bagdasaryan et al. How to backdoor federated learning (2019) https://arxiv.org/pdf/1807.00459.pdf
Federated learning in production
Secure model
communication,
anomaly detection,
etc.
API Federated components
Global model
serving
ML pipeline
APIML pipeline
APIML pipeline
A problem that spans many complex areas
● Decentralized computing / fog computing
● Information and security/systems security expertise
● Trust-mechanisms (third-party or decentralized protocol)
● Machine learning algorithms designed for/adapted to a decentralized setting
● Adversarial ML
○ Data poisoning
○ Inference attacks
○ …
A considerable increase in system and developer complexity
compared to the standard paradigm!
scaleoutsystems.com/federated-machine-learning
Scaleout Federated Platform
Scaleout Studio | Developing Scaleout Store | Package & Deploying Scaleout Serve | Serving
Scaleout Federated Platform
ML studio
- Ingestion
- Prepare & Analyse Data
- Modeling & Testing
- Training
ML workflow automation
- Automated ML Studio
Pipelines
API
API
Model management
- Versioning
- Annotation
- Storage
- Distribution
API
Model
serving
- Traffic
management
- Authentication
/Authorization
- Policies
- Monitoring
Monitoring &
Visualizations
API
API
Endpoint registry
Graphical User Interface
Incl Pipeline Visualization
AuthenticationandAuthorization
Model Sharing
Joint Training
Federation
Orchestration
Federation Identity &
Security
Federation Cross Validation
& Holdout Set
scaleoutsystems.com/federated-machine-learning
scaleoutsystems.com
Thank you!
SCALEOUT
Bridging the gap between research and
production grade systems in machine
learning. Learn more about our Lean AI
framework, and our Federated Machine
Learning platform.
ANDREAS HELLANDER
andreas.hellander@it.uu.se
SALMAN TOOR
salman.toor@it.uu.se
Scaleout FedML platform demo at
Testa Center, GE Healthcare
https://www.youtube.com/watch?v=K-JUNkAYs-4

Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander

  • 1.
    Federated Machine Learning AndreasHellander Co-founder and Lead Scientist, Scaleout Systems Associate Professor in Scientific Computing, Uppsala University scaleoutsystems.com it.uu.se
  • 2.
    Main issues withthe centralized paradigm in machine learning: ● Private/Proprietary data — Sharing valuable business data with someone else is not an option. ● Regulated data — GDPR, HIPAA, etc. ● Practical blockers — data is too big, the network connection is expensive, slow or unreliable. Also, large datasets relevant to AI problems are controlled by a small number of large organizations and there are no great mechanisms for sharing that data with the data science community. scaleoutsystems.com The data centralization problem 1. Collect and centralize data from different sources (data lake, cloud). 2. Create ML model using centralised data (cluster computing)
  • 3.
    How can partiescome together to create joint ML models without sharing/pooling data?
  • 4.
    Federated Machine Learning FederatedMachine Learning (FedML) is a distributed machine learning approach which enables training on decentralised data. ● Train local machine learning model on local/private data. ● Combine local model updates into a global, federated model. Federated learning addresses the fundamental problems of centralized AI such as privacy, ownership, and locality of data. scaleoutsystems.com/federated-machine-learning
  • 5.
    The key benefitof FedML Lets parties form alliances/networks to build stronger models than what could be attained by the parties in isolation. ● Data security and privacy where data never moves. ● Powerful data network effects in industries where data cannot be transferred. ● Reduced data transfer costs when data is very large or networks unreliable. scaleoutsystems.com/federated-machine-learning N. Gauraha, O. Spjuth, A. Hellander (2019), manuscript in preparation
  • 6.
    Early example FedML onGboard: ● Local model for search suggestion, with context and whether suggestion was clicked ● On device the history is processed, and then only a model update is suggested to Google ● Based on Federated Averaging, a scheme to aggregate weights from locally trained neural nets: https://arxiv.org/pdf/1602.05629.pdf scaleoutsystems.com/federated-machine-learning https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
  • 7.
    Smart software ontop of decentralized infrastructure/instruments ● Let’s an instrument/software vendor build smarter software. ● Digital pathology, medical dosimetry, predictive maintenance etc. ● Sensitive data does not need to be shared. ● Powerful network effects possible. scaleoutsystems.com/federated-machine-learning Federated Model Software services Federated learning system Infrastructure vendor
  • 8.
    Integrity-preserving E-health ● Digitaltools/video surveillance in home care. ● Train and deploy models based on homeowners’ private interactions without collecting central data. scaleoutsystems.com/federated-machine-learning
  • 9.
    Privacy-preservation features ofFedML ● Input privacy simplified since data do not move (handled according to local policies) ● Output privacy - depends on the algorithm, how easy it is to invert the model etc. ● What can be learned from the coordination of computation? ○ Different for federated averaging and ensemble methods (algorithm dependent) scaleoutsystems.com/federated-machine-learning UN Handbook for Privacy-Preserving Techniques
  • 10.
    Differential privacy &homomorphic encryption in FedML Differential privacy: Add carefully calibrated noise (protects against inference attacks) Homomorphic encryption: Methods work on encrypted data Secure multiparty computation: Aggregate/compute without a third party trust provider/server. scaleoutsystems.com/federated-machine-learning
  • 11.
    R&D challenges Scalability andML performance How do we (re)design algorithms and frameworks to scale out to the fog and edge? Decentralized computation How can we do FedML without a third-party trust provider? Adversarial ML How can we make the system robust to dishonest members and external threats? FedML is a research area that integrates many differents areas of computer science and mathematics. scaleoutsystems.com/federated-machine-learning
  • 12.
    Backdooring federated learning ●Big threat to a FedML comes from within the alliance / from compromised members. ● Large alliances can be expected to be relatively robust to data poisoning attacks. ● Bagdasaryan et al. shows how their proposed approach of model replacement can efficiently introduce backdoors in a global model. ● Secure aggregation/MPC makes it impossible to detect a malicious model update, and who submitted it! scaleoutsystems.com/federated-machine-learning Bagdasaryan et al. How to backdoor federated learning (2019) https://arxiv.org/pdf/1807.00459.pdf
  • 13.
    Federated learning inproduction Secure model communication, anomaly detection, etc. API Federated components Global model serving ML pipeline APIML pipeline APIML pipeline
  • 14.
    A problem thatspans many complex areas ● Decentralized computing / fog computing ● Information and security/systems security expertise ● Trust-mechanisms (third-party or decentralized protocol) ● Machine learning algorithms designed for/adapted to a decentralized setting ● Adversarial ML ○ Data poisoning ○ Inference attacks ○ … A considerable increase in system and developer complexity compared to the standard paradigm! scaleoutsystems.com/federated-machine-learning
  • 15.
    Scaleout Federated Platform ScaleoutStudio | Developing Scaleout Store | Package & Deploying Scaleout Serve | Serving Scaleout Federated Platform ML studio - Ingestion - Prepare & Analyse Data - Modeling & Testing - Training ML workflow automation - Automated ML Studio Pipelines API API Model management - Versioning - Annotation - Storage - Distribution API Model serving - Traffic management - Authentication /Authorization - Policies - Monitoring Monitoring & Visualizations API API Endpoint registry Graphical User Interface Incl Pipeline Visualization AuthenticationandAuthorization Model Sharing Joint Training Federation Orchestration Federation Identity & Security Federation Cross Validation & Holdout Set scaleoutsystems.com/federated-machine-learning
  • 16.
    scaleoutsystems.com Thank you! SCALEOUT Bridging thegap between research and production grade systems in machine learning. Learn more about our Lean AI framework, and our Federated Machine Learning platform. ANDREAS HELLANDER andreas.hellander@it.uu.se SALMAN TOOR salman.toor@it.uu.se Scaleout FedML platform demo at Testa Center, GE Healthcare https://www.youtube.com/watch?v=K-JUNkAYs-4