Scaling
    the
    Cloud

 Bill Burns
 Director, Information
 Security & Networking


 CISO Executive Summit
 Nov 27, 2012




Thursday, November 29, 12
Agenda


           • Netflix Background and Culture
           • Why We Moved to the Cloud
           • InfoSec Challenges, Solutions in a hybrid DataCenter/
                IaaS Cloud: C.I.A.

           • InfoSec Take-Aways: Running In The Cloud

Thursday, November 29, 12
Netflix
        Business

    • 30+ million members globally
    • Streaming in 51 countries
    • 1B hours streamed/month
    • Watched on 1000+ devices
    • 33% of US peak evening
         Internet traffic
(c) 2011 Sandvine
Thursday, November 29, 12
Background and
           Context

           • High Performance Culture
           • Fail Fast, Learn Fast ... Get Results
           • Some core values:
            • “Freedom & Responsibility”
            • “Loosely-Coupled, Highly-Aligned”
            • “Context not control”
Thursday, November 29, 12
Engineering-
           Centric Culture

           • Sought the Cloud for Availability, Capacity
            • ...and also found Agility
           • DevOps / NoOps means engineering teams own:
            • New deployments and upgrades
            • Capacity planning & procurement

Thursday, November 29, 12
Freedom
                                 &
                            Responsibility




Thursday, November 29, 12
Demand vs Capacity


                            37x growth in
                             13 months




                                            Then-current
                                             DataCenter
                                              Capacity




Thursday, November 29, 12
Demand


                                 1
     Cloud:
     On-
     Demand                          # Servers
     Capacity
                                 2
1. Demand: Typical pattern
   of customer requests rise
   & fall over time
                                     Utilization
2. Reaction: System
   automatically adds,
   removes servers to the
   application pool              3
3. Result: Overall utilization
   stays constant

Thursday, November 29, 12
Running In
                            The Cloud ::
                            InfoSec
                            Perspective




Thursday, November 29, 12
InfoSec In
     The Cloud ::
     Harder

     1.“Your IP address attacked me
     yesterday. Please stop it!”
     2.Dealing with other people’s traffic
     at your front door
     3.Herding ephemeral instances
     with vendor applications
     4.Trusting endpoints, infrastructure
     5.Key management

Thursday, November 29, 12
InfoSec In our
           Cloud :: Easier

           1.Reacting to business velocity    6.Embedding security controls
           2.Detecting instance changes       7.Least privilege enforcement
           3.Application ownership,
             management                       8.Testing/auditing for
                                                conformance
           4.Patching, updating
           5.Availability, in a environment   9.Consistency, conformity in
             you don’t control                  environment


Thursday, November 29, 12
InfoSec DevOps ::
           Staying Relevant

           • “Communication is what the listener does” – Mark
                Horstman, Manager Tools podcast / Peter Drucker

           • My team’s goal: InfoSec program adds value, deeper
                part of the business’ success, not a “bolt-on”

           • Pain: Learning a new vocabulary, systems thinking
           • End result: We like this model a lot!

Thursday, November 29, 12
InfoSec
                            Confiden"ality'                   Challenges
                                                             In An IaaS
             U"lity'                          Integrity'
                                                             Cloud


        Authen"city'                         Availability'


                            Possession'




Thursday, November 29, 12
InfoSec Challenge
                            in an IaaS Cloud ::
                            Availability




Thursday, November 29, 12
Availability ::
     Assume
     failures

     •You’re only good at what you
         regularly test for
     •If you fear a failure mode, find a
         way to automate a test for that
     •Chaos Monkey/Gorilla induce
         failures, help us practice recovery
     •Include security control systems
         in your failure testing too!
(c) Courtesy Flikr - Winton
Thursday, November 29, 12
The Netflix
     Simian Army
     & other
     Security
     Controls                 •   Chaos Monkey - Randomly kills instances

                              •   Chaos Gorilla - Evacuates entire data centers
  • Striving for continuous   •   Janitor Monkey – Ensures a clean inventory
      testing, monitoring

  • Identify and test         •   Security Monkey – Various security checks
      common failure modes
                              •   Exploit Monkey – Under development
  • Automation
      everywhere              •   Critical Systems – File integrity monitoring,
                                  HIDS, WAF baked in as needed


Thursday, November 29, 12
InfoSec Challenge
                            in an IaaS Cloud ::
                            Integrity




Thursday, November 29, 12
Key: Automation




Thursday, November 29, 12
Integrity ::
        Patching

        • Goal: Running instances do not get patched
        • Alternative:
          • Bake a new AMI for any change
          • Launch, test new instances in parallel
          • Kill the old instances

Thursday, November 29, 12
Integrity ::
     Upgrades

     • Bake a new AMI for
         any change
     • Launch new instances
         in parallel
     • Kill the old instances

      Lesson Learned: Make the secure-and-
      consistent behavior the easier alternative.


Thursday, November 29, 12
Embedding
                            Security
                            Controls


                            • Controls baked into our templates
                              • Places controls near the data
                              • Automation ensures coverage as
                                machines born, replaced
                            • Security controls are “Data Center
                              agnostic”
                              • Provide a single view of attack
                                surface
                              • Evolving, work in progress



Thursday, November 29, 12
Security
      Controls:
      WAF
      Example

  •   Sample Control: Web
      Application Firewall

      •   Software-only, baked-in AMI

      •   Control spans all
          environments, regions

      •   Consistent control, view

      •   Zero effort for developer to
          add protection



Thursday, November 29, 12
Automation =
     Conformity
     &
     Consistency


     • All apps, tiers are Highly
         Available

     • Secure defaults applied
         automatically

     • Replacement instances
         look just like the originals

     • Includes security controls


Thursday, November 29, 12
InfoSec Challenge
                            in an IaaS Cloud ::
                            Confidentiality/
                            Possession




Thursday, November 29, 12
Key
                              Management ::
                              Cloud Hardware
                              Security
                              Modules (HSMs)

                            • Problem:
                              • Need crypto keys near the Cloud
                              • HSMs are in the data center
                              • Can’t entirely trust our CSP
                            • Motivation:
                              • Want to decouple DC and Cloud
                              • Want to trust our Cloud more fully
                              • If we want this, others will probably want
                                 it too.
                            • Solution:
                              • A real HSM: FIPS 140-2 certified
                                 hardware
                              • Keys stay in hardware
                              • “HSM as a Service”

Thursday, November 29, 12
InfoSec Cloud
           Take-Aways

      • Our cloud operations and DevOps models were disruptive to:
        • Engineering, Auditors, Vendors, and other Operations teams
      • Our InfoSec team:
        • Learned new cloud operational approaches, techniques, our PaaS
        • Wrote/consumed APIs and services, learned a new AWS alphabet soup
        • Had to tweak most software to fit this model; easier to start cloud first
        • Worked with partners to implement new security controls

Thursday, November 29, 12
Thank you!


                                          @x509v3
                            Bill.Burns@Netflix.com




Thursday, November 29, 12

Ciso executive summit 2012

  • 1.
    Scaling the Cloud Bill Burns Director, Information Security & Networking CISO Executive Summit Nov 27, 2012 Thursday, November 29, 12
  • 2.
    Agenda • Netflix Background and Culture • Why We Moved to the Cloud • InfoSec Challenges, Solutions in a hybrid DataCenter/ IaaS Cloud: C.I.A. • InfoSec Take-Aways: Running In The Cloud Thursday, November 29, 12
  • 3.
    Netflix Business • 30+ million members globally • Streaming in 51 countries • 1B hours streamed/month • Watched on 1000+ devices • 33% of US peak evening Internet traffic (c) 2011 Sandvine Thursday, November 29, 12
  • 4.
    Background and Context • High Performance Culture • Fail Fast, Learn Fast ... Get Results • Some core values: • “Freedom & Responsibility” • “Loosely-Coupled, Highly-Aligned” • “Context not control” Thursday, November 29, 12
  • 5.
    Engineering- Centric Culture • Sought the Cloud for Availability, Capacity • ...and also found Agility • DevOps / NoOps means engineering teams own: • New deployments and upgrades • Capacity planning & procurement Thursday, November 29, 12
  • 6.
    Freedom & Responsibility Thursday, November 29, 12
  • 7.
    Demand vs Capacity 37x growth in 13 months Then-current DataCenter Capacity Thursday, November 29, 12
  • 8.
    Demand 1 Cloud: On- Demand # Servers Capacity 2 1. Demand: Typical pattern of customer requests rise & fall over time Utilization 2. Reaction: System automatically adds, removes servers to the application pool 3 3. Result: Overall utilization stays constant Thursday, November 29, 12
  • 9.
    Running In The Cloud :: InfoSec Perspective Thursday, November 29, 12
  • 10.
    InfoSec In The Cloud :: Harder 1.“Your IP address attacked me yesterday. Please stop it!” 2.Dealing with other people’s traffic at your front door 3.Herding ephemeral instances with vendor applications 4.Trusting endpoints, infrastructure 5.Key management Thursday, November 29, 12
  • 11.
    InfoSec In our Cloud :: Easier 1.Reacting to business velocity 6.Embedding security controls 2.Detecting instance changes 7.Least privilege enforcement 3.Application ownership, management 8.Testing/auditing for conformance 4.Patching, updating 5.Availability, in a environment 9.Consistency, conformity in you don’t control environment Thursday, November 29, 12
  • 12.
    InfoSec DevOps :: Staying Relevant • “Communication is what the listener does” – Mark Horstman, Manager Tools podcast / Peter Drucker • My team’s goal: InfoSec program adds value, deeper part of the business’ success, not a “bolt-on” • Pain: Learning a new vocabulary, systems thinking • End result: We like this model a lot! Thursday, November 29, 12
  • 13.
    InfoSec Confiden"ality' Challenges In An IaaS U"lity' Integrity' Cloud Authen"city' Availability' Possession' Thursday, November 29, 12
  • 14.
    InfoSec Challenge in an IaaS Cloud :: Availability Thursday, November 29, 12
  • 15.
    Availability :: Assume failures •You’re only good at what you regularly test for •If you fear a failure mode, find a way to automate a test for that •Chaos Monkey/Gorilla induce failures, help us practice recovery •Include security control systems in your failure testing too! (c) Courtesy Flikr - Winton Thursday, November 29, 12
  • 16.
    The Netflix Simian Army & other Security Controls • Chaos Monkey - Randomly kills instances • Chaos Gorilla - Evacuates entire data centers • Striving for continuous • Janitor Monkey – Ensures a clean inventory testing, monitoring • Identify and test • Security Monkey – Various security checks common failure modes • Exploit Monkey – Under development • Automation everywhere • Critical Systems – File integrity monitoring, HIDS, WAF baked in as needed Thursday, November 29, 12
  • 17.
    InfoSec Challenge in an IaaS Cloud :: Integrity Thursday, November 29, 12
  • 18.
  • 19.
    Integrity :: Patching • Goal: Running instances do not get patched • Alternative: • Bake a new AMI for any change • Launch, test new instances in parallel • Kill the old instances Thursday, November 29, 12
  • 20.
    Integrity :: Upgrades • Bake a new AMI for any change • Launch new instances in parallel • Kill the old instances Lesson Learned: Make the secure-and- consistent behavior the easier alternative. Thursday, November 29, 12
  • 21.
    Embedding Security Controls • Controls baked into our templates • Places controls near the data • Automation ensures coverage as machines born, replaced • Security controls are “Data Center agnostic” • Provide a single view of attack surface • Evolving, work in progress Thursday, November 29, 12
  • 22.
    Security Controls: WAF Example • Sample Control: Web Application Firewall • Software-only, baked-in AMI • Control spans all environments, regions • Consistent control, view • Zero effort for developer to add protection Thursday, November 29, 12
  • 23.
    Automation = Conformity & Consistency • All apps, tiers are Highly Available • Secure defaults applied automatically • Replacement instances look just like the originals • Includes security controls Thursday, November 29, 12
  • 24.
    InfoSec Challenge in an IaaS Cloud :: Confidentiality/ Possession Thursday, November 29, 12
  • 25.
    Key Management :: Cloud Hardware Security Modules (HSMs) • Problem: • Need crypto keys near the Cloud • HSMs are in the data center • Can’t entirely trust our CSP • Motivation: • Want to decouple DC and Cloud • Want to trust our Cloud more fully • If we want this, others will probably want it too. • Solution: • A real HSM: FIPS 140-2 certified hardware • Keys stay in hardware • “HSM as a Service” Thursday, November 29, 12
  • 26.
    InfoSec Cloud Take-Aways • Our cloud operations and DevOps models were disruptive to: • Engineering, Auditors, Vendors, and other Operations teams • Our InfoSec team: • Learned new cloud operational approaches, techniques, our PaaS • Wrote/consumed APIs and services, learned a new AWS alphabet soup • Had to tweak most software to fit this model; easier to start cloud first • Worked with partners to implement new security controls Thursday, November 29, 12
  • 27.
    Thank you! @x509v3 Bill.Burns@Netflix.com Thursday, November 29, 12