Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander

Federated Machine Learning
Andreas Hellander
Co-founder and Lead Scientist, Scaleout Systems
Associate Professor in Scientific Computing, Uppsala University
scaleoutsystems.com it.uu.se

Main issues with the centralized paradigm
in machine learning:
● Private/Proprietary data — Sharing
valuable business data with someone
else is not an option.
● Regulated data — GDPR, HIPAA, etc.
● Practical blockers — data is too big,
the network connection is expensive,
slow or unreliable.
Also, large datasets relevant to AI
problems are controlled by a small number
of large organizations and there are no
great mechanisms for sharing that data
with the data science community.
scaleoutsystems.com
The data centralization problem
1. Collect and centralize data from
different sources (data lake, cloud).
2. Create ML model using centralised
data (cluster computing)

How can parties come together to create joint
ML models without sharing/pooling data?

Federated Machine Learning
Federated Machine Learning (FedML) is a
distributed machine learning approach
which enables training on decentralised
data.
● Train local machine learning model on
local/private data.
● Combine local model updates into a
global, federated model.
Federated learning addresses the
fundamental problems of centralized AI
such as privacy, ownership, and locality of
data.
scaleoutsystems.com/federated-machine-learning

The key benefit of FedML
Lets parties form alliances/networks to
build stronger models than what could be
attained by the parties in isolation.
● Data security and privacy where data
never moves.
● Powerful data network effects in
industries where data cannot be
transferred.
● Reduced data transfer costs when
data is very large or networks
unreliable.
N. Gauraha, O. Spjuth, A. Hellander (2019), manuscript in preparation

Early example
FedML on Gboard:
● Local model for search suggestion,
with context and whether suggestion
was clicked
● On device the history is processed,
and then only a model update is
suggested to Google
● Based on Federated Averaging, a
scheme to aggregate weights from
locally trained neural nets:
https://arxiv.org/pdf/1602.05629.pdf
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

Smart software on top of decentralized
infrastructure/instruments ● Let’s an instrument/software vendor
build smarter software.
● Digital pathology, medical dosimetry,
predictive maintenance etc.
● Sensitive data does not need to be
shared.
● Powerful network effects possible.
Federated
Model
Software services
Federated learning system
Infrastructure vendor

Integrity-preserving E-health
● Digital tools/video surveillance in
home care.
● Train and deploy models based on
homeowners’ private interactions
without collecting central data.

Privacy-preservation features of FedML
● Input privacy simplified since data do
not move (handled according to local
policies)
● Output privacy - depends on the
algorithm, how easy it is to invert the
model etc.
● What can be learned from the
coordination of computation?
○ Different for federated averaging
and ensemble methods
(algorithm dependent)
UN Handbook for Privacy-Preserving Techniques

Differential privacy & homomorphic encryption in FedML
Differential privacy: Add
carefully calibrated noise
(protects against inference
attacks)
Homomorphic encryption: Methods
work on encrypted data
Secure multiparty computation:
Aggregate/compute without a
third party trust
provider/server.

R&D challenges
Scalability and ML
performance
How do we (re)design
algorithms and frameworks
to scale out to the fog and
edge?
Decentralized computation
How can we do FedML
without a third-party trust
provider?
Adversarial ML
How can we make the
system robust to dishonest
members and external
threats?
FedML is a research area that integrates many differents areas of
computer science and mathematics.

Backdooring federated learning
● Big threat to a FedML comes from
within the alliance / from
compromised members.
● Large alliances can be expected to be
relatively robust to data poisoning
attacks.
● Bagdasaryan et al. shows how their
proposed approach of model
replacement can efficiently introduce
backdoors in a global model.
● Secure aggregation/MPC makes it
impossible to detect a malicious
model update, and who submitted it!
Bagdasaryan et al. How to backdoor federated learning (2019) https://arxiv.org/pdf/1807.00459.pdf

Federated learning in production
Secure model
communication,
anomaly detection,
etc.
API Federated components
Global model
serving
ML pipeline
APIML pipeline
APIML pipeline

A problem that spans many complex areas
● Decentralized computing / fog computing
● Information and security/systems security expertise
● Trust-mechanisms (third-party or decentralized protocol)
● Machine learning algorithms designed for/adapted to a decentralized setting
● Adversarial ML
○ Data poisoning
○ Inference attacks
○ …
A considerable increase in system and developer complexity
compared to the standard paradigm!

Scaleout Federated Platform
Scaleout Studio | Developing Scaleout Store | Package & Deploying Scaleout Serve | Serving
Scaleout Federated Platform
ML studio
- Ingestion
- Prepare & Analyse Data
- Modeling & Testing
- Training
ML workflow automation
- Automated ML Studio
Pipelines
API
API
Model management
- Versioning
- Annotation
- Storage
- Distribution
API
Model
serving
- Traffic
management
- Authentication
/Authorization
- Policies
- Monitoring
Monitoring &
Visualizations
API
API
Endpoint registry
Graphical User Interface
Incl Pipeline Visualization
AuthenticationandAuthorization
Model Sharing
Joint Training
Federation
Orchestration
Federation Identity &
Security
Federation Cross Validation
& Holdout Set

scaleoutsystems.com
Thank you!
SCALEOUT
Bridging the gap between research and
production grade systems in machine
learning. Learn more about our Lean AI
framework, and our Federated Machine
Learning platform.
ANDREAS HELLANDER
andreas.hellander@it.uu.se
SALMAN TOOR
salman.toor@it.uu.se
Scaleout FedML platform demo at
Testa Center, GE Healthcare
https://www.youtube.com/watch?v=K-JUNkAYs-4

Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander

Similar to Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander (20)

More from Dataconomy Media

More from Dataconomy Media (20)

Recently uploaded

Recently uploaded (20)

Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Secure AI" - Andreas Hellander