SRE[in]con 2019

Building Secure and
Scalable Machine Learning
Pipelines: Challenges and
security Patterns
Anamitra Dutta Majumdar
Thomas Goetze

Talk
Outline
ML Use cases at
LinkedIn
Phases in
ML pipelines and
Infrastructure
Security Risks
and Security
Solution Patterns
Scalability and
Security Control
challenges.

Acknowledgements
SRE, Data APA ,
Foundation and AI teams
for partnering
in developing security
controls for the different
phases of the machine
learning pipelines.
House Security Team for
spearheading some of
the cutting-edge security
initiatives at LinkedIn to
scale the pace of
business innovation in a
secure manner.
Marius Seritan from
Relevance Infra team for
helping us put the slides
together during the initial
phases of preparation

Machine Learning
Model
• Trained on sample data
• Used to make a statistical
prediction
• 23% chance user will click on job
• 10% chance non-professional
content
Feature
• Input to a model
• Category (accountant job,
programming job)
• Numeric (years of experience)
• Term Vectors (contains "dev ops")
• Construction
• Reduction

Machine Learning Use Cases at LinkedIn
• started 12 years ago
• one of the early growth mechanisms
• Heavy innovation in online serving (Gaia), Venice Compute, Graph
Convolutional Network
PYMK
• Leader in experimentation velocity, hundreds of experiments per
quarter
• Custom deployment, dark canaryFeed
• Multi layered models
• Generalized linear ensemble
• TensorFlow embeddings
Job Search

Pro-ML Initiative
Tooling to manage all aspects of machine
learning.
Unified product and frameworks to reduce
routine work.
GOAL: Double productivity of machine learning
engineers.

Phases in Machine Learning Pipeline and
ProML
Primary
security
concerns
• Data Exfi
ltration
• Unautho
rized
Data
Access
Feature
Engineering
Mode;
Artifac
t Store
Model
Health
Assurance
Model
Training
Compute
Model
Deployment
Ececutor
Model
scoring and
Selection
Hyperpara
meter
tuning
Model
Explorer
Model
Registry
Model
Inference
Engine
AI
Meta
data
Hub
Proml Work
Space App
Plugins/SDKs/Pip
eline framework
Model
Runtime
Environment
Data
Store
Experimentation:
Model Training and
Evaluation
Model Deployment
Model Serving and Inference

Data Storage and Management Infrastructure
Gobblin
Espresso
Data
Sources
3rd Party
Services
through
GAAP
Data
Ingestion
Oracle DB
HDFS
Venice
Data
Storage
Dataset Access
Management
Layer
Espresso
Dali
View
Compute Orchestration Infrastructure
A/B
testing
Cluster
Management
Compute
Engines
Workflow
Orchestration
Use cases
Relevance
Analytics
Reporting
YARN Azkaban
k
K8s master

Pro-ML Use
Case
onboarding
Jobs Response
Prediction Service
ATC/notifications
OASIS/ads

ML
Pipeline Security
Challenges
Experimentation Phase:
Unauthorized data
access, Sensitive data
leakage
Model Training Phase:
Sensitive data
generation and leakage
Deployment
Phase: Unauthorized
model actions Leakage
of sensitive models
Inference Phase: DoS
Security
Misconfiguration,
Member Inference,
Vulnerabilities

Security
Controls an
d Patterns
• Access Controls
• Encryption
• Privacy Preserving Libraries
• Feature sensitivity annotation
Experimentation
• Encryption
• Authenticated, authorized and automated flows
• Cleanup Sensitive intermediate dataset
• ML Classifiers
Model Training
• Dual Verification and Multi-factor Authentication
• Model Randomization
• Use of synthetic data
Model Deployment
• Visibility into workload
• Segregation of workloads
• Secure configuration
• Model Health Assurance
Model Inference

Examples of security controls in Pro-ML Deployment phase and
runtime environment
Publication requires
read access by
a privileged account,
opt-in policy
Runtime access to
wormhole for
reading model
artifacts
Current: encrypted
keytab
Future: service
principals using
KSudo

Dev and EI validation
In theory trained models should work for
inference just like they were trained
In reality, code that uses the models can be
non-trivial and developers need access to
models to test in development environments
How to allow models in
EI and DEV
Validate no PII in the models
Obfuscation
randomization

Challenges: Heterogenous authentication controls for offline and online
world
OFFLINE GRID
Name Node
Data Node
Distributed
ML JobsML Job
Scheduler
Kerberos
Delegation
Token
Block Access
TokenBlock Access
Token
Kerberos
ONLNE
Service A Service BMutual TLS
Kerberos
X509
Certificates

Security Control Pattern: Heterogenous
Authentication and authorization control
pattern
Translator
Service
Identity Management
System
Secret Store
Distributed
Compute
Distributed
Storage
Service B
Service A

Segregation of Compute and Storage to remove
Tight Coupling
Web Server

Key Take ways for
building ML Pipeline
Security Patterns
Segregation of Infrastructure
Segregation of storage and
computation
Segregation based workload sensitivity
Control plane and data plane
components
AI Metadata system
Model training and inference time
security threats and requirements
Centralized Feature Metadata System
Monitoring
Continuous monitoring
Scanning
Security metrics
Security Infrastructure
Efficient Identity Management platform
wrappers and access layers
Scalable Key Management System
Security Control Scaling
Engineer and operationalize the Automation
of the security controls

Thank You
Contact the presenters at
• amajumdar@linkedin.com
• tgoetze@linkedin.com

SRE[in]con 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SRE[in]con 2019

Similar to SRE[in]con 2019 (20)

Recently uploaded

Recently uploaded (20)

SRE[in]con 2019

Editor's Notes