SlideShare a Scribd company logo
1 of 48
Cloud Application Security: Lessons Learned
Jason Chan
chan@netflix.com
Netflix, Inc.


 “Netflix is the world’s leading Internet television
  network with more than 33 million members in
   40 countries enjoying more than one billion
   hours of TV shows and movies per month,
            including original series . . .”

Source: http://ir.netflix.com
Me
 Director of Engineering @ Netflix
 Responsible for:
   Cloud app, product, infrastructure, ops security
 Previously:
   Led security team @ VMware
   Earlier, primarily security consulting at @stake, iSEC Partners
AppSec Challenges
Lots of Good Advice
  BSIMM
  Microsoft SDL
  SAFECode
But, what works?




  Forrester Consulting, 12/10
Especially, given phenomena such as DevOps,
cloud, agile, and the unique characteristics of an
                   organization?
Netflix Engineering Characteristics
Netflix in the Cloud - Why?




                          “Undifferentiated heavy lifting”
Netflix is now ~99% in the cloud
On the way to the cloud . . .
On the way to the cloud . . .




                                (or NoOps,
                                depending on definitions)
Some As-Is #s
  33m+ subscribers
  10,000s of systems
  100s of engineers, apps
  ~250 test deployments/day **
  ~70 production deployments/day *




    ** Sample based on one week‟s activities
Deploying Code at Netflix
A common graph @ Netflix
 Lots of watching in prime time               Not as much in early morning




             Old way - pay and provision for peak, 24/7/365

   Multiply this pattern across the dozens of apps that comprise the
                        Netflix streaming service
Solution: Load-Based Autoscaling
Autoscaling
 Goals:
   # of systems matches load requirements
   Load per server is constant
   Happens without intervention (the „auto‟ in autoscaling)
 Results:
   Clusters continuously add & remove nodes
   New nodes must mirror existing
Every change requires a new cluster push
(not an incremental change to existing systems)
Deploying code must be easy
           (it is)
Netflix Deployment Pipeline


                 RPM with
                app-specific                   VM template
                    bits                      ready to launch


                   YUM                             AMI




Perforce/Git                      Bakery                            ASG
Code change                    Base image +                      Cluster config
Config change                     RPM                           Running systems
Operational Impact
 No changes to running systems
 No systems mgmt infrastructure (Puppet, Chef, etc.)
 Fewer logins to prod
 No snowflakes
 Trivial “rollback”
Security Impact
 Need to think differently on:
    Vulnerability management
    Patch management
    User activity monitoring
    File integrity monitoring
    Forensic investigations
Org, architecture, deployment is different.
          What about security?
We‟ve adapted too.
Some principles we‟ve found useful.
Cloud Application Security: What We Emphasize
Points of Emphasis
 Integrate                  Two contexts:
                               1. Integration with your
 Make the right way easy         engineering ecosystem
 Self-service, with           2. Integration of your security
  exceptions                      controls
                             Organization
 Trust, but verify
                             SCM, build and release
                             Monitoring and alerting




                                                                 26
Integration: Base AMI Testing
 The base AMI is managed like other packages, via P4, Jenkins, etc.

 We watch the SCM directory & kick off testing when it changes

 Launch an instance of the AMI, perform vuln scan and other checks
                                         SCAN COMPLETED ALERT

                                         Site name: AMI1

                                         Stopped by: N/A

                                         Total Scan Time: 4 minutes 46 seconds

                                         Critical Vulnerabilities: 5
                                         Severe Vulnerabilities:   4
                                         Moderate Vulnerabilities: 4
Integration: Control Packaging and Installation

  From the RPM spec file of a webserver:
 Requires:   ossec cloudpassage nflx-base-harden hyperguard-enforcer



 Pulls in the following RPMs:
    HIDS agent
    Config assessment/firewall agent
    Host hardening package
    WAF
Integration: Timeline (Chronos)
 What IP addresses have been blacklisted by the WAF in
  the last few weeks?
 GET /api/v1/event?timelines=type:blacklist&start=20130125000000000

 Which security groups have changed today?
 GET /api/v1/event?timelines=type:securitygroup&start=20130206000000000
Integration: Static Analysis
  Available self-service through build environment
    FindBugs, PMD
  Jenkins plugin to display graphs and support drill
   through to results
Integration: Static Analysis
Integration: Alerting (Central Alerting Gateway)
 Single place to generate and deliver alerts
 Python, Java libraries (or JSON post)
 Ties in to PagerDuty notification/escalation system
 Permits stateful alerting and some response
 A prerequisite that our security tools will leverage
CAG Example


  import CORE.Gateway

  gw = CORE.Gateway.Gateway()


  # testcluster is a defined app with associated escalation
  # schedule in PagerDuty
  gw.send("testcluster", "normal", "Something went wrong")
Points of Emphasis
 Integrate                  Developers are lazy

 Make the right way easy
 Self-service, with
  exceptions
 Trust, but verify
Making it Easy: Cryptex
 Crypto: DDIY (“Don‟t Do It Yourself”)
 Many uses of crypto in web/distributed systems:
   Encrypt/decrypt (cookies, data, etc.)
   Sign/verify (URLs, data, etc.)
 Netflix also uses heavily for device activation, DRM
  playback, etc.
Making it Easy: Cryptex
 Multi-layer crypto system (HSM basis, scale out layer)
   Easy to use
   Key management handled transparently
   Access control and auditable operations
Making it Easy: Cloud-Based SSO
 In the AWS cloud, access to data center services is
  problematic
   Examples: AD, LDAP, DNS
 But, many cloud-based systems require authN, authZ
   Examples: Dashboards, admin UIs
 Asking developers to securely handle/accept credentials
  is also problematic
Making it Easy: Cloud-Based SSO
 Solution: Leverage OneLogin SaaS SSO (SAML) used
  by IT for enterprise apps (e.g. Workday, Google Apps)
 Uses Active Directory credentials
 Provides a single & centralized login page
    Developers don‟t accept username & password directly
 Built filter for our base server to make SSO/authN trivial
Points of Emphasis
 Integrate                  Self-service is perhaps the
                              most transformative cloud
 Make the right way easy     characteristic
 Self-service, with         Failing to adopt this for security
  exceptions                  controls will lead to friction
 Trust, but verify
Self-Service: Security Groups
 Asgard cloud orchestration tool allows developers to
  configure their own firewall rules
 Limited to same AWS account, no IP-based rules
Points of Emphasis
 Integrate                  Culture precludes traditional
                              “command and control”
 Make the right way easy     approach
 Self-service, with         Organizational desire for
  exceptions                  agile, DevOps, CI/CD blur
                              traditional security
 Trust, but verify           engagement touchpoints
Trust but Verify: Security Monkey
 Cloud APIs make verification       Includes:
  and analysis of configuration         Certificate checking
  and running state simpler             Firewall analysis
 Security Monkey created as            IAM entity analysis
  the framework for this analysis       Limit warnings
                                        Resource policy analysis
Trust but Verify: Security Monkey




                   From: Security Monkey
                   Date: Wed, 24 Oct 2012 17:08:18 +0000
                   To: Security Alerts
                   Subject: prod Changes Detected


                          Table of Contents:
                              Security Groups

                                      Changed Security Group


                                          <sgname> (eu-west-1 / prod)
                                           <#Security Group/<sgname> (eu-west-1 / prod)>
Trust but Verify: Exploit Monkey
  AWS Autoscaling group is unit of deployment, so
   changes signal a good time to rerun dynamic scans

 On 10/23/12 12:35 PM, Exploit Monkey wrote:

 I noticed that testapp-live has changed current ASG name from testapp-
 live-v001 to testapp-live-v002.

 I'm starting a vulnerability scan against test app from these
 private/public IPs:
 10.29.24.174
Trust but Verify: ELB Checker (gauntlt)
 AWS Elastic Load Balancer (ELB) provides cross-
  datacenter traffic balancing, but no security controls
    If your cluster is attached to an ELB, it is available to the Internet
 Engineers may misunderstand:
    ELB use cases (and alternatives)
    Security features
    Other measures used to protect ELB-fronted clusters
Trust but Verify: ELB Checker (gauntlt)
1. Launch gauntlt test runner instance,
   loaded with “master list” of ELBs and
   expected state

2. Determine “target list” of current ELBs
   to evaluate

3. Generate per-ELB listener gauntlt
   attack files

4. Execute attacks

5. Alert on failures and new ELBs

6. Triage findings and update master list
Takeaways
  Netflix runs a large, dynamic service in AWS

  Newer concepts like cloud & DevOps need an
   updated approach to security

  Specific context can help jumpstart a pragmatic
   and effective security program

  Don‟t swim upstream - integrate and collaborate
   with your engineering partners
Questions?




             chan@netflix.com

More Related Content

What's hot

What's hot (20)

AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (D...
AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (D...AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (D...
AWS re:Invent 2016: DevOps on AWS: Advanced Continuous Delivery Techniques (D...
 
Putting it All Together: Securing Systems at Cloud Scale
Putting it All Together: Securing Systems at Cloud ScalePutting it All Together: Securing Systems at Cloud Scale
Putting it All Together: Securing Systems at Cloud Scale
 
Deep Dive on Elastic Load Balancing
Deep Dive on Elastic Load BalancingDeep Dive on Elastic Load Balancing
Deep Dive on Elastic Load Balancing
 
Accelerating and Securing your Applications in AWS. In-depth look at Solving ...
Accelerating and Securing your Applications in AWS. In-depth look at Solving ...Accelerating and Securing your Applications in AWS. In-depth look at Solving ...
Accelerating and Securing your Applications in AWS. In-depth look at Solving ...
 
Operations: Security
Operations: SecurityOperations: Security
Operations: Security
 
Leveraging elastic web scale computing with AWS
 Leveraging elastic web scale computing with AWS Leveraging elastic web scale computing with AWS
Leveraging elastic web scale computing with AWS
 
Serverless Security: What's Left to Protect?
Serverless Security: What's Left to Protect?Serverless Security: What's Left to Protect?
Serverless Security: What's Left to Protect?
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Automating Event Driven Security in the AWS Cloud
Automating Event Driven Security in the AWS CloudAutomating Event Driven Security in the AWS Cloud
Automating Event Driven Security in the AWS Cloud
 
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
Keynote - Cloudy Vision: How Cloud Integration Complicates SecurityKeynote - Cloudy Vision: How Cloud Integration Complicates Security
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
 
State of Union - Containerz
State of Union - ContainerzState of Union - Containerz
State of Union - Containerz
 
Policy as code what helm developers need to know about security
Policy as code  what helm developers need to know about securityPolicy as code  what helm developers need to know about security
Policy as code what helm developers need to know about security
 
AWS re:Invent 2016: Enabling DevOps for an Enterprise with AWS Service Catalo...
AWS re:Invent 2016: Enabling DevOps for an Enterprise with AWS Service Catalo...AWS re:Invent 2016: Enabling DevOps for an Enterprise with AWS Service Catalo...
AWS re:Invent 2016: Enabling DevOps for an Enterprise with AWS Service Catalo...
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
 
Automating Software Deployments with AWS CodeDeploy
Automating Software Deployments with AWS CodeDeployAutomating Software Deployments with AWS CodeDeploy
Automating Software Deployments with AWS CodeDeploy
 
Azure Security Center
Azure Security CenterAzure Security Center
Azure Security Center
 
Power of the Cloud - Introduction to Microsoft Azure Security
Power of the Cloud - Introduction to Microsoft Azure SecurityPower of the Cloud - Introduction to Microsoft Azure Security
Power of the Cloud - Introduction to Microsoft Azure Security
 
Shared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudShared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure Cloud
 
AWS Security Strategy
AWS Security StrategyAWS Security Strategy
AWS Security Strategy
 
Defending Serverless Infrastructure in the Cloud RSAC 2020
Defending Serverless Infrastructure in the Cloud RSAC 2020Defending Serverless Infrastructure in the Cloud RSAC 2020
Defending Serverless Infrastructure in the Cloud RSAC 2020
 

Viewers also liked

From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product Security
Jason Chan
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
Jason Chan
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
aspyker
 

Viewers also liked (20)

Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product Security
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit Perspectives
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from Abuse
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at Yahoo
 
Resilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons LearnedResilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons Learned
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's Perspective
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 

Similar to Cloud Application Security: Lessons Learned

Similar to Cloud Application Security: Lessons Learned (20)

Security and Advanced Automation in the Enterprise
Security and Advanced Automation in the EnterpriseSecurity and Advanced Automation in the Enterprise
Security and Advanced Automation in the Enterprise
 
Integrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdfIntegrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdf
 
Advanced Continuous Delivery on AWS
Advanced Continuous Delivery on AWSAdvanced Continuous Delivery on AWS
Advanced Continuous Delivery on AWS
 
Security as Code: DOES15
Security as Code: DOES15Security as Code: DOES15
Security as Code: DOES15
 
Integrating_Cloud_Development_Security_And_Operations.pdf
Integrating_Cloud_Development_Security_And_Operations.pdfIntegrating_Cloud_Development_Security_And_Operations.pdf
Integrating_Cloud_Development_Security_And_Operations.pdf
 
DevOps
DevOpsDevOps
DevOps
 
Pragmatic Pipeline Security
Pragmatic Pipeline SecurityPragmatic Pipeline Security
Pragmatic Pipeline Security
 
Defcon23 from zero to secure in 1 minute - nir valtman and moshe ferber
Defcon23   from zero to secure in 1 minute - nir valtman and moshe ferberDefcon23   from zero to secure in 1 minute - nir valtman and moshe ferber
Defcon23 from zero to secure in 1 minute - nir valtman and moshe ferber
 
Toward Full Stack Security
Toward Full Stack SecurityToward Full Stack Security
Toward Full Stack Security
 
Making Security Agile
Making Security AgileMaking Security Agile
Making Security Agile
 
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Launch .NET Applications in the Cloud
Launch .NET Applications in the CloudLaunch .NET Applications in the Cloud
Launch .NET Applications in the Cloud
 
DEFCON 23 - Nir Valtman and Moshe Ferber - from zero to secure in 1
DEFCON 23 - Nir Valtman and  Moshe Ferber - from zero to secure in 1DEFCON 23 - Nir Valtman and  Moshe Ferber - from zero to secure in 1
DEFCON 23 - Nir Valtman and Moshe Ferber - from zero to secure in 1
 
Cncf checkov and bridgecrew
Cncf checkov and bridgecrewCncf checkov and bridgecrew
Cncf checkov and bridgecrew
 
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
 
Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18
 
Automating Security in Cloud Workloads with DevSecOps
Automating Security in Cloud Workloads with DevSecOpsAutomating Security in Cloud Workloads with DevSecOps
Automating Security in Cloud Workloads with DevSecOps
 
Building an Automated Security Fabric in AWS
Building an Automated Security Fabric in AWSBuilding an Automated Security Fabric in AWS
Building an Automated Security Fabric in AWS
 
Infrastructure Provisioning & Automation For Large Enterprises
Infrastructure Provisioning & Automation For Large EnterprisesInfrastructure Provisioning & Automation For Large Enterprises
Infrastructure Provisioning & Automation For Large Enterprises
 
Transforming Software Development
Transforming Software DevelopmentTransforming Software Development
Transforming Software Development
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Cloud Application Security: Lessons Learned

  • 1. Cloud Application Security: Lessons Learned Jason Chan chan@netflix.com
  • 2. Netflix, Inc. “Netflix is the world’s leading Internet television network with more than 33 million members in 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series . . .” Source: http://ir.netflix.com
  • 3. Me  Director of Engineering @ Netflix  Responsible for:  Cloud app, product, infrastructure, ops security  Previously:  Led security team @ VMware  Earlier, primarily security consulting at @stake, iSEC Partners
  • 5. Lots of Good Advice  BSIMM  Microsoft SDL  SAFECode
  • 6. But, what works? Forrester Consulting, 12/10
  • 7. Especially, given phenomena such as DevOps, cloud, agile, and the unique characteristics of an organization?
  • 9. Netflix in the Cloud - Why? “Undifferentiated heavy lifting”
  • 10. Netflix is now ~99% in the cloud
  • 11. On the way to the cloud . . .
  • 12. On the way to the cloud . . . (or NoOps, depending on definitions)
  • 13. Some As-Is #s  33m+ subscribers  10,000s of systems  100s of engineers, apps  ~250 test deployments/day **  ~70 production deployments/day * ** Sample based on one week‟s activities
  • 14. Deploying Code at Netflix
  • 15. A common graph @ Netflix Lots of watching in prime time Not as much in early morning Old way - pay and provision for peak, 24/7/365 Multiply this pattern across the dozens of apps that comprise the Netflix streaming service
  • 17. Autoscaling  Goals:  # of systems matches load requirements  Load per server is constant  Happens without intervention (the „auto‟ in autoscaling)  Results:  Clusters continuously add & remove nodes  New nodes must mirror existing
  • 18. Every change requires a new cluster push (not an incremental change to existing systems)
  • 19. Deploying code must be easy (it is)
  • 20. Netflix Deployment Pipeline RPM with app-specific VM template bits ready to launch YUM AMI Perforce/Git Bakery ASG Code change Base image + Cluster config Config change RPM Running systems
  • 21. Operational Impact  No changes to running systems  No systems mgmt infrastructure (Puppet, Chef, etc.)  Fewer logins to prod  No snowflakes  Trivial “rollback”
  • 22. Security Impact  Need to think differently on:  Vulnerability management  Patch management  User activity monitoring  File integrity monitoring  Forensic investigations
  • 23. Org, architecture, deployment is different. What about security?
  • 24. We‟ve adapted too. Some principles we‟ve found useful.
  • 25. Cloud Application Security: What We Emphasize
  • 26. Points of Emphasis  Integrate  Two contexts: 1. Integration with your  Make the right way easy engineering ecosystem  Self-service, with 2. Integration of your security exceptions controls  Organization  Trust, but verify  SCM, build and release  Monitoring and alerting 26
  • 27. Integration: Base AMI Testing  The base AMI is managed like other packages, via P4, Jenkins, etc.  We watch the SCM directory & kick off testing when it changes  Launch an instance of the AMI, perform vuln scan and other checks SCAN COMPLETED ALERT Site name: AMI1 Stopped by: N/A Total Scan Time: 4 minutes 46 seconds Critical Vulnerabilities: 5 Severe Vulnerabilities: 4 Moderate Vulnerabilities: 4
  • 28. Integration: Control Packaging and Installation  From the RPM spec file of a webserver: Requires: ossec cloudpassage nflx-base-harden hyperguard-enforcer  Pulls in the following RPMs:  HIDS agent  Config assessment/firewall agent  Host hardening package  WAF
  • 29. Integration: Timeline (Chronos)  What IP addresses have been blacklisted by the WAF in the last few weeks?  GET /api/v1/event?timelines=type:blacklist&start=20130125000000000  Which security groups have changed today?  GET /api/v1/event?timelines=type:securitygroup&start=20130206000000000
  • 30. Integration: Static Analysis  Available self-service through build environment  FindBugs, PMD  Jenkins plugin to display graphs and support drill through to results
  • 32. Integration: Alerting (Central Alerting Gateway)  Single place to generate and deliver alerts  Python, Java libraries (or JSON post)  Ties in to PagerDuty notification/escalation system  Permits stateful alerting and some response  A prerequisite that our security tools will leverage
  • 33. CAG Example import CORE.Gateway gw = CORE.Gateway.Gateway() # testcluster is a defined app with associated escalation # schedule in PagerDuty gw.send("testcluster", "normal", "Something went wrong")
  • 34. Points of Emphasis  Integrate  Developers are lazy  Make the right way easy  Self-service, with exceptions  Trust, but verify
  • 35. Making it Easy: Cryptex  Crypto: DDIY (“Don‟t Do It Yourself”)  Many uses of crypto in web/distributed systems:  Encrypt/decrypt (cookies, data, etc.)  Sign/verify (URLs, data, etc.)  Netflix also uses heavily for device activation, DRM playback, etc.
  • 36. Making it Easy: Cryptex  Multi-layer crypto system (HSM basis, scale out layer)  Easy to use  Key management handled transparently  Access control and auditable operations
  • 37. Making it Easy: Cloud-Based SSO  In the AWS cloud, access to data center services is problematic  Examples: AD, LDAP, DNS  But, many cloud-based systems require authN, authZ  Examples: Dashboards, admin UIs  Asking developers to securely handle/accept credentials is also problematic
  • 38. Making it Easy: Cloud-Based SSO  Solution: Leverage OneLogin SaaS SSO (SAML) used by IT for enterprise apps (e.g. Workday, Google Apps)  Uses Active Directory credentials  Provides a single & centralized login page  Developers don‟t accept username & password directly  Built filter for our base server to make SSO/authN trivial
  • 39. Points of Emphasis  Integrate  Self-service is perhaps the most transformative cloud  Make the right way easy characteristic  Self-service, with  Failing to adopt this for security exceptions controls will lead to friction  Trust, but verify
  • 40. Self-Service: Security Groups  Asgard cloud orchestration tool allows developers to configure their own firewall rules  Limited to same AWS account, no IP-based rules
  • 41. Points of Emphasis  Integrate  Culture precludes traditional “command and control”  Make the right way easy approach  Self-service, with  Organizational desire for exceptions agile, DevOps, CI/CD blur traditional security  Trust, but verify engagement touchpoints
  • 42. Trust but Verify: Security Monkey  Cloud APIs make verification  Includes: and analysis of configuration  Certificate checking and running state simpler  Firewall analysis  Security Monkey created as  IAM entity analysis the framework for this analysis  Limit warnings  Resource policy analysis
  • 43. Trust but Verify: Security Monkey From: Security Monkey Date: Wed, 24 Oct 2012 17:08:18 +0000 To: Security Alerts Subject: prod Changes Detected Table of Contents: Security Groups Changed Security Group <sgname> (eu-west-1 / prod) <#Security Group/<sgname> (eu-west-1 / prod)>
  • 44. Trust but Verify: Exploit Monkey  AWS Autoscaling group is unit of deployment, so changes signal a good time to rerun dynamic scans On 10/23/12 12:35 PM, Exploit Monkey wrote: I noticed that testapp-live has changed current ASG name from testapp- live-v001 to testapp-live-v002. I'm starting a vulnerability scan against test app from these private/public IPs: 10.29.24.174
  • 45. Trust but Verify: ELB Checker (gauntlt)  AWS Elastic Load Balancer (ELB) provides cross- datacenter traffic balancing, but no security controls  If your cluster is attached to an ELB, it is available to the Internet  Engineers may misunderstand:  ELB use cases (and alternatives)  Security features  Other measures used to protect ELB-fronted clusters
  • 46. Trust but Verify: ELB Checker (gauntlt) 1. Launch gauntlt test runner instance, loaded with “master list” of ELBs and expected state 2. Determine “target list” of current ELBs to evaluate 3. Generate per-ELB listener gauntlt attack files 4. Execute attacks 5. Alert on failures and new ELBs 6. Triage findings and update master list
  • 47. Takeaways  Netflix runs a large, dynamic service in AWS  Newer concepts like cloud & DevOps need an updated approach to security  Specific context can help jumpstart a pragmatic and effective security program  Don‟t swim upstream - integrate and collaborate with your engineering partners
  • 48. Questions? chan@netflix.com