SlideShare a Scribd company logo
1 of 19
Building Secure and
Scalable Machine Learning
Pipelines: Challenges and
security Patterns
Anamitra Dutta Majumdar
Thomas Goetze
Talk
Outline
ML Use cases at
LinkedIn
Phases in
ML pipelines and
Infrastructure
Security Risks
and Security
Solution Patterns
Scalability and
Security Control
challenges.
Acknowledgements
SRE, Data APA ,
Foundation and AI teams
for partnering
in developing security
controls for the different
phases of the machine
learning pipelines.
House Security Team for
spearheading some of
the cutting-edge security
initiatives at LinkedIn to
scale the pace of
business innovation in a
secure manner.
Marius Seritan from
Relevance Infra team for
helping us put the slides
together during the initial
phases of preparation
Machine Learning
Model
• Trained on sample data
• Used to make a statistical
prediction
• 23% chance user will click on job
• 10% chance non-professional
content
Feature
• Input to a model
• Category (accountant job,
programming job)
• Numeric (years of experience)
• Term Vectors (contains "dev ops")
• Construction
• Reduction
Machine Learning Use Cases at LinkedIn
• started 12 years ago
• one of the early growth mechanisms
• Heavy innovation in online serving (Gaia), Venice Compute, Graph
Convolutional Network
PYMK
• Leader in experimentation velocity, hundreds of experiments per
quarter
• Custom deployment, dark canaryFeed
• Multi layered models
• Generalized linear ensemble
• TensorFlow embeddings
Job Search
Pro-ML Initiative
Tooling to manage all aspects of machine
learning.
Unified product and frameworks to reduce
routine work.
GOAL: Double productivity of machine learning
engineers.
Phases in Machine Learning Pipeline and
ProML
Primary
security
concerns
• Data Exfi
ltration
• Unautho
rized
Data
Access
Feature
Engineering
Mode;
Artifac
t Store
Model
Health
Assurance
Model
Training
Compute
Model
Deployment
Ececutor
Model
scoring and
Selection
Hyperpara
meter
tuning
Model
Explorer
Model
Registry
Model
Inference
Engine
AI
Meta
data
Hub
Proml Work
Space App
Plugins/SDKs/Pip
eline framework
Model
Runtime
Environment
Data
Store
Experimentation:
Model Training and
Evaluation
Model Deployment
Model Serving and Inference
Data Storage and Management Infrastructure
Gobblin
Espresso
Data
Sources
3rd Party
Services
through
GAAP
Data
Ingestion
Oracle DB
HDFS
Venice
Data
Storage
Dataset Access
Management
Layer
Espresso
Dali
View
Compute Orchestration Infrastructure
A/B
testing
Cluster
Management
Compute
Engines
Workflow
Orchestration
Use cases
Relevance
Analytics
Reporting
YARN Azkaban
k
K8s master
Pro-ML
Model
Deployment
Pro-ML Use
Case
onboarding
Jobs Response
Prediction Service
ATC/notifications
OASIS/ads
ML
Pipeline Security
Challenges
Experimentation Phase:
Unauthorized data
access, Sensitive data
leakage
Model Training Phase:
Sensitive data
generation and leakage
Deployment
Phase: Unauthorized
model actions Leakage
of sensitive models
Inference Phase: DoS
Security
Misconfiguration,
Member Inference,
Vulnerabilities
Security
Controls an
d Patterns
• Access Controls
• Encryption
• Privacy Preserving Libraries
• Feature sensitivity annotation
Experimentation
• Encryption
• Authenticated, authorized and automated flows
• Cleanup Sensitive intermediate dataset
• ML Classifiers
Model Training
• Dual Verification and Multi-factor Authentication
• Model Randomization
• Use of synthetic data
Model Deployment
• Visibility into workload
• Segregation of workloads
• Secure configuration
• Model Health Assurance
Model Inference
Examples of security controls in Pro-ML Deployment phase and
runtime environment
Publication requires
read access by
a privileged account,
opt-in policy
Runtime access to
wormhole for
reading model
artifacts
Current: encrypted
keytab
Future: service
principals using
KSudo
Dev and EI validation
In theory trained models should work for
inference just like they were trained
In reality, code that uses the models can be
non-trivial and developers need access to
models to test in development environments
How to allow models in
EI and DEV
Validate no PII in the models
Obfuscation
randomization
Challenges: Heterogenous authentication controls for offline and online
world
OFFLINE GRID
Name Node
Data Node
Distributed
ML JobsML Job
Scheduler
Kerberos
Delegation
Token
Block Access
TokenBlock Access
Token
Kerberos
ONLNE
Service A Service BMutual TLS
Kerberos
X509
Certificates
Security Control Pattern: Heterogenous
Authentication and authorization control
pattern
Translator
Service
Identity Management
System
Secret Store
Distributed
Compute
Distributed
Storage
Service B
Service A
Segregation of Compute and Storage to remove
Tight Coupling
Web Server
Key Take ways for
building ML Pipeline
Security Patterns​
Segregation of Infrastructure
Segregation of storage and
computation
Segregation based workload sensitivity
Control plane and data plane
components
AI Metadata system
Model training and inference time
security threats and requirements
Centralized Feature Metadata System
Monitoring
Continuous monitoring
Scanning
Security metrics
Security Infrastructure
Efficient Identity Management platform
wrappers and access layers
Scalable Key Management System
Security Control Scaling
Engineer and operationalize the Automation
of the security controls
Thank You
Contact the presenters at
• amajumdar@linkedin.com
• tgoetze@linkedin.com

More Related Content

What's hot

Threat Modeling And Analysis
Threat Modeling And AnalysisThreat Modeling And Analysis
Threat Modeling And AnalysisLalit Kale
 
Application Threat Modeling
Application Threat ModelingApplication Threat Modeling
Application Threat ModelingMarco Morana
 
6 Most Popular Threat Modeling Methodologies
 6 Most Popular Threat Modeling Methodologies 6 Most Popular Threat Modeling Methodologies
6 Most Popular Threat Modeling MethodologiesEC-Council
 
Threat Modeling Everything
Threat Modeling EverythingThreat Modeling Everything
Threat Modeling EverythingAnne Oikarinen
 
Threat Modeling workshop by Robert Hurlbut
Threat Modeling workshop by Robert HurlbutThreat Modeling workshop by Robert Hurlbut
Threat Modeling workshop by Robert HurlbutDevSecCon
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementPriyanka Aash
 
Rapid Threat Modeling : case study
Rapid Threat Modeling : case studyRapid Threat Modeling : case study
Rapid Threat Modeling : case studyAntonio Fontes
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskSecurity Innovation
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineeringAHM Pervej Kabir
 
Software Security Initiatives
Software Security InitiativesSoftware Security Initiatives
Software Security InitiativesMarco Morana
 
Introduction to Threat Modeling
Introduction to Threat ModelingIntroduction to Threat Modeling
Introduction to Threat ModelingInMobi Technology
 
Threat Modeling 101
Threat Modeling 101Threat Modeling 101
Threat Modeling 101Vlad Styran
 
A successful application security program - Envision build and scale
A successful application security program - Envision build and scaleA successful application security program - Envision build and scale
A successful application security program - Envision build and scalePriyanka Aash
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)abhimanyubhogwan
 
Cyber Threat Modeling
Cyber Threat ModelingCyber Threat Modeling
Cyber Threat ModelingEC-Council
 
Security patterns and model driven architecture
Security patterns and model driven architectureSecurity patterns and model driven architecture
Security patterns and model driven architecturebdemchak
 
TESEM: A Tool for Verifying Security Design Pattern Applications
TESEM: A Tool for Verifying Security Design Pattern ApplicationsTESEM: A Tool for Verifying Security Design Pattern Applications
TESEM: A Tool for Verifying Security Design Pattern ApplicationsHironori Washizaki
 
Nicholas DiCola | Secure your IT resources with Azure Security Center
Nicholas DiCola | Secure your IT resources with Azure Security CenterNicholas DiCola | Secure your IT resources with Azure Security Center
Nicholas DiCola | Secure your IT resources with Azure Security CenterMicrosoft Österreich
 

What's hot (20)

Threat Modeling And Analysis
Threat Modeling And AnalysisThreat Modeling And Analysis
Threat Modeling And Analysis
 
Application Threat Modeling
Application Threat ModelingApplication Threat Modeling
Application Threat Modeling
 
6 Most Popular Threat Modeling Methodologies
 6 Most Popular Threat Modeling Methodologies 6 Most Popular Threat Modeling Methodologies
6 Most Popular Threat Modeling Methodologies
 
Threat Modeling Everything
Threat Modeling EverythingThreat Modeling Everything
Threat Modeling Everything
 
How to produce more secure web apps
How to produce more secure web appsHow to produce more secure web apps
How to produce more secure web apps
 
Null bachav
Null bachavNull bachav
Null bachav
 
Threat Modeling workshop by Robert Hurlbut
Threat Modeling workshop by Robert HurlbutThreat Modeling workshop by Robert Hurlbut
Threat Modeling workshop by Robert Hurlbut
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
 
Rapid Threat Modeling : case study
Rapid Threat Modeling : case studyRapid Threat Modeling : case study
Rapid Threat Modeling : case study
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security Risk
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineering
 
Software Security Initiatives
Software Security InitiativesSoftware Security Initiatives
Software Security Initiatives
 
Introduction to Threat Modeling
Introduction to Threat ModelingIntroduction to Threat Modeling
Introduction to Threat Modeling
 
Threat Modeling 101
Threat Modeling 101Threat Modeling 101
Threat Modeling 101
 
A successful application security program - Envision build and scale
A successful application security program - Envision build and scaleA successful application security program - Envision build and scale
A successful application security program - Envision build and scale
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)
 
Cyber Threat Modeling
Cyber Threat ModelingCyber Threat Modeling
Cyber Threat Modeling
 
Security patterns and model driven architecture
Security patterns and model driven architectureSecurity patterns and model driven architecture
Security patterns and model driven architecture
 
TESEM: A Tool for Verifying Security Design Pattern Applications
TESEM: A Tool for Verifying Security Design Pattern ApplicationsTESEM: A Tool for Verifying Security Design Pattern Applications
TESEM: A Tool for Verifying Security Design Pattern Applications
 
Nicholas DiCola | Secure your IT resources with Azure Security Center
Nicholas DiCola | Secure your IT resources with Azure Security CenterNicholas DiCola | Secure your IT resources with Azure Security Center
Nicholas DiCola | Secure your IT resources with Azure Security Center
 

Similar to SRE[in]con 2019

Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Michael Hidalgo
 
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...Amazon Web Services
 
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...BlueHat Security Conference
 
Integrated Security for Software Development and Advanced Penetration Testing...
Integrated Security for Software Development and Advanced Penetration Testing...Integrated Security for Software Development and Advanced Penetration Testing...
Integrated Security for Software Development and Advanced Penetration Testing...Symptai Consulting Limited
 
Security Patterns: Research Direction, Metamodel, Application and Verification
Security Patterns: Research Direction, Metamodel, Application and VerificationSecurity Patterns: Research Direction, Metamodel, Application and Verification
Security Patterns: Research Direction, Metamodel, Application and VerificationHironori Washizaki
 
Protecting microservices using secure design patterns 1.0
Protecting microservices using secure design patterns 1.0Protecting microservices using secure design patterns 1.0
Protecting microservices using secure design patterns 1.0Trupti Shiralkar, CISSP
 
How to develop an AppSec culture in your project
How to develop an AppSec culture in your project How to develop an AppSec culture in your project
How to develop an AppSec culture in your project 99X Technology
 
AppSec in an Agile World
AppSec in an Agile WorldAppSec in an Agile World
AppSec in an Agile WorldDavid Lindner
 
Zerotrusting serverless applications protecting microservices using secure d...
Zerotrusting serverless applications  protecting microservices using secure d...Zerotrusting serverless applications  protecting microservices using secure d...
Zerotrusting serverless applications protecting microservices using secure d...Trupti Shiralkar, CISSP
 
[Warsaw 26.06.2018] SDL Threat Modeling principles
[Warsaw 26.06.2018] SDL Threat Modeling principles[Warsaw 26.06.2018] SDL Threat Modeling principles
[Warsaw 26.06.2018] SDL Threat Modeling principlesOWASP
 
Security engineering 101 when good design & security work together
Security engineering 101  when good design & security work togetherSecurity engineering 101  when good design & security work together
Security engineering 101 when good design & security work togetherWendy Knox Everette
 
Securing your Machine Learning models
Securing your Machine Learning modelsSecuring your Machine Learning models
Securing your Machine Learning modelsPhilipBasford
 
Fundamentals of Microsoft 365 Security , Identity and Compliance
Fundamentals of Microsoft 365 Security , Identity and ComplianceFundamentals of Microsoft 365 Security , Identity and Compliance
Fundamentals of Microsoft 365 Security , Identity and ComplianceVignesh Ganesan I Microsoft MVP
 
Securing DevOps through Privileged Access Management
Securing DevOps through Privileged Access ManagementSecuring DevOps through Privileged Access Management
Securing DevOps through Privileged Access ManagementBeyondTrust
 
DevSecOps: Minimizing Risk, Improving Security
DevSecOps: Minimizing Risk, Improving SecurityDevSecOps: Minimizing Risk, Improving Security
DevSecOps: Minimizing Risk, Improving SecurityFranklin Mosley
 
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...cscpconf
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stackMinhaz A V
 
The What, Why, and How of DevSecOps
The What, Why, and How of DevSecOpsThe What, Why, and How of DevSecOps
The What, Why, and How of DevSecOpsCprime
 

Similar to SRE[in]con 2019 (20)

Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...Application Security Testing for Software Engineers: An approach to build sof...
Application Security Testing for Software Engineers: An approach to build sof...
 
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
 
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
BlueHat Seattle 2019 || Building Secure Machine Learning Pipelines: Security ...
 
Integrated Security for Software Development and Advanced Penetration Testing...
Integrated Security for Software Development and Advanced Penetration Testing...Integrated Security for Software Development and Advanced Penetration Testing...
Integrated Security for Software Development and Advanced Penetration Testing...
 
Security Patterns: Research Direction, Metamodel, Application and Verification
Security Patterns: Research Direction, Metamodel, Application and VerificationSecurity Patterns: Research Direction, Metamodel, Application and Verification
Security Patterns: Research Direction, Metamodel, Application and Verification
 
Protecting microservices using secure design patterns 1.0
Protecting microservices using secure design patterns 1.0Protecting microservices using secure design patterns 1.0
Protecting microservices using secure design patterns 1.0
 
How to develop an AppSec culture in your project
How to develop an AppSec culture in your project How to develop an AppSec culture in your project
How to develop an AppSec culture in your project
 
Building an AppSec Culture
Building an AppSec Culture Building an AppSec Culture
Building an AppSec Culture
 
AppSec in an Agile World
AppSec in an Agile WorldAppSec in an Agile World
AppSec in an Agile World
 
Zerotrusting serverless applications protecting microservices using secure d...
Zerotrusting serverless applications  protecting microservices using secure d...Zerotrusting serverless applications  protecting microservices using secure d...
Zerotrusting serverless applications protecting microservices using secure d...
 
[Warsaw 26.06.2018] SDL Threat Modeling principles
[Warsaw 26.06.2018] SDL Threat Modeling principles[Warsaw 26.06.2018] SDL Threat Modeling principles
[Warsaw 26.06.2018] SDL Threat Modeling principles
 
Security engineering 101 when good design & security work together
Security engineering 101  when good design & security work togetherSecurity engineering 101  when good design & security work together
Security engineering 101 when good design & security work together
 
Securing your Machine Learning models
Securing your Machine Learning modelsSecuring your Machine Learning models
Securing your Machine Learning models
 
SESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/MLSESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/ML
 
Fundamentals of Microsoft 365 Security , Identity and Compliance
Fundamentals of Microsoft 365 Security , Identity and ComplianceFundamentals of Microsoft 365 Security , Identity and Compliance
Fundamentals of Microsoft 365 Security , Identity and Compliance
 
Securing DevOps through Privileged Access Management
Securing DevOps through Privileged Access ManagementSecuring DevOps through Privileged Access Management
Securing DevOps through Privileged Access Management
 
DevSecOps: Minimizing Risk, Improving Security
DevSecOps: Minimizing Risk, Improving SecurityDevSecOps: Minimizing Risk, Improving Security
DevSecOps: Minimizing Risk, Improving Security
 
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...
DEPENDABLE PRIVACY REQUIREMENTS BY AGILE MODELED LAYERED SECURITY ARCHITECTUR...
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
The What, Why, and How of DevSecOps
The What, Why, and How of DevSecOpsThe What, Why, and How of DevSecOps
The What, Why, and How of DevSecOps
 

Recently uploaded

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Recently uploaded (20)

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

SRE[in]con 2019

  • 1. Building Secure and Scalable Machine Learning Pipelines: Challenges and security Patterns Anamitra Dutta Majumdar Thomas Goetze
  • 2. Talk Outline ML Use cases at LinkedIn Phases in ML pipelines and Infrastructure Security Risks and Security Solution Patterns Scalability and Security Control challenges.
  • 3. Acknowledgements SRE, Data APA , Foundation and AI teams for partnering in developing security controls for the different phases of the machine learning pipelines. House Security Team for spearheading some of the cutting-edge security initiatives at LinkedIn to scale the pace of business innovation in a secure manner. Marius Seritan from Relevance Infra team for helping us put the slides together during the initial phases of preparation
  • 4. Machine Learning Model • Trained on sample data • Used to make a statistical prediction • 23% chance user will click on job • 10% chance non-professional content Feature • Input to a model • Category (accountant job, programming job) • Numeric (years of experience) • Term Vectors (contains "dev ops") • Construction • Reduction
  • 5. Machine Learning Use Cases at LinkedIn • started 12 years ago • one of the early growth mechanisms • Heavy innovation in online serving (Gaia), Venice Compute, Graph Convolutional Network PYMK • Leader in experimentation velocity, hundreds of experiments per quarter • Custom deployment, dark canaryFeed • Multi layered models • Generalized linear ensemble • TensorFlow embeddings Job Search
  • 6. Pro-ML Initiative Tooling to manage all aspects of machine learning. Unified product and frameworks to reduce routine work. GOAL: Double productivity of machine learning engineers.
  • 7. Phases in Machine Learning Pipeline and ProML Primary security concerns • Data Exfi ltration • Unautho rized Data Access Feature Engineering Mode; Artifac t Store Model Health Assurance Model Training Compute Model Deployment Ececutor Model scoring and Selection Hyperpara meter tuning Model Explorer Model Registry Model Inference Engine AI Meta data Hub Proml Work Space App Plugins/SDKs/Pip eline framework Model Runtime Environment Data Store Experimentation: Model Training and Evaluation Model Deployment Model Serving and Inference
  • 8. Data Storage and Management Infrastructure Gobblin Espresso Data Sources 3rd Party Services through GAAP Data Ingestion Oracle DB HDFS Venice Data Storage Dataset Access Management Layer Espresso Dali View Compute Orchestration Infrastructure A/B testing Cluster Management Compute Engines Workflow Orchestration Use cases Relevance Analytics Reporting YARN Azkaban k K8s master
  • 10. Pro-ML Use Case onboarding Jobs Response Prediction Service ATC/notifications OASIS/ads
  • 11. ML Pipeline Security Challenges Experimentation Phase: Unauthorized data access, Sensitive data leakage Model Training Phase: Sensitive data generation and leakage Deployment Phase: Unauthorized model actions Leakage of sensitive models Inference Phase: DoS Security Misconfiguration, Member Inference, Vulnerabilities
  • 12. Security Controls an d Patterns • Access Controls • Encryption • Privacy Preserving Libraries • Feature sensitivity annotation Experimentation • Encryption • Authenticated, authorized and automated flows • Cleanup Sensitive intermediate dataset • ML Classifiers Model Training • Dual Verification and Multi-factor Authentication • Model Randomization • Use of synthetic data Model Deployment • Visibility into workload • Segregation of workloads • Secure configuration • Model Health Assurance Model Inference
  • 13. Examples of security controls in Pro-ML Deployment phase and runtime environment Publication requires read access by a privileged account, opt-in policy Runtime access to wormhole for reading model artifacts Current: encrypted keytab Future: service principals using KSudo
  • 14. Dev and EI validation In theory trained models should work for inference just like they were trained In reality, code that uses the models can be non-trivial and developers need access to models to test in development environments How to allow models in EI and DEV Validate no PII in the models Obfuscation randomization
  • 15. Challenges: Heterogenous authentication controls for offline and online world OFFLINE GRID Name Node Data Node Distributed ML JobsML Job Scheduler Kerberos Delegation Token Block Access TokenBlock Access Token Kerberos ONLNE Service A Service BMutual TLS Kerberos X509 Certificates
  • 16. Security Control Pattern: Heterogenous Authentication and authorization control pattern Translator Service Identity Management System Secret Store Distributed Compute Distributed Storage Service B Service A
  • 17. Segregation of Compute and Storage to remove Tight Coupling Web Server
  • 18. Key Take ways for building ML Pipeline Security Patterns​ Segregation of Infrastructure Segregation of storage and computation Segregation based workload sensitivity Control plane and data plane components AI Metadata system Model training and inference time security threats and requirements Centralized Feature Metadata System Monitoring Continuous monitoring Scanning Security metrics Security Infrastructure Efficient Identity Management platform wrappers and access layers Scalable Key Management System Security Control Scaling Engineer and operationalize the Automation of the security controls
  • 19. Thank You Contact the presenters at • amajumdar@linkedin.com • tgoetze@linkedin.com

Editor's Notes

  1. Notes: Major use cases of ML to solve business problems High Level Phases in ML pipelines Security risks and security solution patterns in each phase Scalability challenges in applying security controls to various distributed systems involved in the pipeline
  2. High level phases in any machine learning pipeline with each phase acting on data that has been primarily collected by events generated from member activity​ Experimentation phase where data sciemtists perform data exploration, problem formulation , feature engineering and model authoring​ Model Training phase where model is trained on  set of features and best model is selected and pushed to the model artifact store​ In model deployment phase the tooling instructs and online service when a new model is available for serving​ In the last phase the model is fetched from the model artifact store and loaded to the deployment target typically an online service for inference.​ On product initiative to unify all workflows within a single pane of glass with the goal of boosting productivity. Security is a cross cutting concern in each of the layers and phases Application of controls are challenging when full service or user attribution is required across phases and infrastructure Rule based Authorization enforced at Data access layer
  3. Analytics infra deals with data coming from various sources, which is ingested by highly efficient data ingestion platforms stored on our PT-scale HDFS,and abstracted as cluster agnostic logical datasets.
  4. Risks: Model Experimentation Unauthorized data access at various points in the pipeline at various phases Data leakage as part of some of the automated flows into less secure regions in the environment Combination of two or more non-PII datasets  to generate PII Features. Model Training Phase Leakage of sensitive data generated in an intermediate training or feature transformation phase. Models learning PII Model Deployment Unauthorized Model actions like Model publication to production environments Use of vulnerable components as the effect is immediately amplified Movement of sensitive models into less sensitive areas of the infrastructure Inference phase: Model runtime misconfiguration or vulnerable component use leading to resource exhaustion and hence DoS Authentication of Model Runtime environment to fetch models Model performance degradation 
  5. Access Controls: In the case of Dataset access controls are implemented based on roles and rules Roles are enforced when generating an authenticator like tokens that use a single source of identity truth for entitlements determination  and rules are enforced based on the attributes of the target .The pattern is to enforce such rules in a data access layer and set attributes on the dataset in a metadata hub. The rules should be based on the attributes. Compute engines for training and inference phases should authorize and authenticate to the data by using system level identities. Encryption: Sensitive data should be encrypted at rest and in motion . There are use cases where encrypted cannot be used ,in case such as those data should be decrypted for the shortest periods of time when in use. It is worth mentioning here that there are some experimental efforts underway for using model training on data that uses homomorphic encryption. Use of privacy preserving libraries at training time to ensure PII is not leaked through the combination of non-PII fields. The pattern is to integrate these tools and libraries into the model training phase of the pipeline. Model  Deployment Phase Needs verification for model deployment actions like publish and test. In case of test deployments randomization techniques prevent leakage of PII to less secure environments. Ideally tests should be performed using synthetic data Model Inference Visibility into compute workloads of various sensitivity levels to look for abnormal traffic patterns Segregate workloads based on sensitivity level of the data they handle define key performance metrics based on threat severity. Model runtime configuration should be secure. One of the tests to ensure this would be by running security bench marks. An important aspect to monitor would be to look for model accuracy degradation over time.
  6. Online services uses certificates and offline jobs  Kerberos for Authentication. Gaps in full user attribution within a context of an operation that spans online and offline entities. This is an issue in cases where an online service needs to contact infrastructure components that rely on token based authentication and authorization. A good example of this in the actual pipeline is the Model Runtime Environment pulling model files from Model Artifact Store.
  7. Introduce a translator service that understands online and offline authentication constructs. The translator services must preserve full user attribution during translation The translator service must consult a single source of identity truth trusted by both online and offline  Examples are Ksudo and PKInit
  8. Scalability challenges for applying security controls due to tight coupling of compute and storage in GRID
  9. Machine learning workloads are inherently distributed in nature. In order to apply security controls in a scalable manner important security Boundaries needs to be determined based on blast radius of compromise, data leakage and exfiltration risks. A high level guideline Is to divide the distributed systems along the following dimensions Storage tier and compute tier control plane and data plane components Sensitivity level of workloads ML Domain understanding Some of the security risks are very specific to the ML domain like Model performance degradation over time,  or  Models learning PII during  model training So it is important to understand the purpose of the model, the features they are trained on etc. Another important aspect is to maintain an AI metadata system that supports sensitivity annotation of various type of data used in the pipeline. Such Annotations is extremely useful for decision making around dataflow between different points in the pipeline Monitoring At various levels and at various points of the pipeline different phases that captures the full audit of events performed by human and user principles . Available and Scalable Security Infrastructure The identify management system is the core for establishing user attribution based on various authenticators. It is imperative for te indentity management Systems to help generate identities and authenticators for all the phases and in case for cases where the workflow spans across phases it should help in authenticator translation  Key management system is essential for storing encryption keys, private keys tokens that are required at various levels.  Both the Identity Management and the key management systems should have High Availability to prevent single point of failure and high scalability the meet the demands of calls from massively distributed systems. Security Control Scaling through automation Additionally functions like scaling, authenticator generation based on target use understanding needs to be automated, Detection of PII in unwanted areas