Practical Cloud Security

  • 3,991 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,991
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
97
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Practical Cloud Security Jason Chan chan@netflix.comTuesday, October 11, 2011
  • 2. Agenda • Background and Disclaimers • Netflix in the Cloud • Model-Driven Deployment Architecture • APIs, Automation, and the Security Monkey • Cloud Firewall and Connectivity Analysis • Practical Cloud Security GapsTuesday, October 11, 2011
  • 3. Background and DisclaimersTuesday, October 11, 2011
  • 4. Background and DisclaimersTuesday, October 11, 2011
  • 5. Background and Disclaimers • No cloud definitions, but . . .Tuesday, October 11, 2011
  • 6. Background and Disclaimers • No cloud definitions, but . . . • Focus on IaaSTuesday, October 11, 2011
  • 7. Background and Disclaimers • No cloud definitions, but . . . • Focus on IaaS • Netflix uses Amazon Web ServicesTuesday, October 11, 2011
  • 8. Background and Disclaimers • No cloud definitions, but . . . • Focus on IaaS • Netflix uses Amazon Web Services • Guidance should be generally applicableTuesday, October 11, 2011
  • 9. Background and Disclaimers • No cloud definitions, but . . . • Focus on IaaS • Netflix uses Amazon Web Services • Guidance should be generally applicable • Works in progress, still many problems to solve . . .Tuesday, October 11, 2011
  • 10. Netflix in the CloudTuesday, October 11, 2011
  • 11. Why is Netflix Using Cloud?Tuesday, October 11, 2011
  • 12. !"#"$%&#&($Tuesday, October 11, 2011
  • 13. !"#"$%&#&($ Netflix could not build data centers fast enoughTuesday, October 11, 2011
  • 14. !"#"$%&#&($ Netflix could not build data centers fast enough Capacity requirements accelerating, unpredictableTuesday, October 11, 2011
  • 15. !"#"$%&#&($ Netflix could not build data centers fast enough Capacity requirements accelerating, unpredictable Product launch spikes - iPhone, Wii, PS2, XBoxTuesday, October 11, 2011
  • 16. Outgrowing Data Center http://techblog.netflix.com/2011/02/redesigning-netflix-api.html Netflix API: Growth in RequestsTuesday, October 11, 2011
  • 17. Outgrowing Data Center http://techblog.netflix.com/2011/02/redesigning-netflix-api.html Netflix API: Growth in Requests 37x Growth 1/10 - 1/11Tuesday, October 11, 2011
  • 18. Outgrowing Data Center http://techblog.netflix.com/2011/02/redesigning-netflix-api.html Netflix API: Growth in Requests 37x Growth 1/10 - 1/11Tuesday, October 11, 2011
  • 19. Outgrowing Data Center http://techblog.netflix.com/2011/02/redesigning-netflix-api.html Netflix API: Growth in Requests 37x Growth 1/10 - 1/11 !"#"$%&#%( )"*"$+#,(Tuesday, October 11, 2011
  • 20. netflix.com is now ~100% CloudTuesday, October 11, 2011
  • 21. netflix.com is now ~100% Cloud Remaining components being migratedTuesday, October 11, 2011
  • 22. Netflix Model-Driven ArchitectureTuesday, October 11, 2011
  • 23. Data Center PatternsTuesday, October 11, 2011
  • 24. Data Center Patterns • Long-lived, non-elastic systemsTuesday, October 11, 2011
  • 25. Data Center Patterns • Long-lived, non-elastic systems • Push code and config to running systemsTuesday, October 11, 2011
  • 26. Data Center Patterns • Long-lived, non-elastic systems • Push code and config to running systems • Difficult to enforce deployment patternsTuesday, October 11, 2011
  • 27. Data Center Patterns • Long-lived, non-elastic systems • Push code and config to running systems • Difficult to enforce deployment patterns • ‘Snowflake phenomenon’Tuesday, October 11, 2011
  • 28. Data Center Patterns • Long-lived, non-elastic systems • Push code and config to running systems • Difficult to enforce deployment patterns • ‘Snowflake phenomenon’ • Difficult to sync or reproduce environments (e.g. test and prod)Tuesday, October 11, 2011
  • 29. Cloud PatternsTuesday, October 11, 2011
  • 30. Cloud Patterns • Ephemeral nodesTuesday, October 11, 2011
  • 31. Cloud Patterns • Ephemeral nodes • Dynamic scalingTuesday, October 11, 2011
  • 32. Cloud Patterns • Ephemeral nodes • Dynamic scaling • Hardware is abstractedTuesday, October 11, 2011
  • 33. Cloud Patterns • Ephemeral nodes • Dynamic scaling • Hardware is abstracted • Orchestration vs. manual stepsTuesday, October 11, 2011
  • 34. Cloud Patterns • Ephemeral nodes • Dynamic scaling • Hardware is abstracted • Orchestration vs. manual steps • Trivial to clone environmentsTuesday, October 11, 2011
  • 35. When Moving to the Cloud, Leave Old Ways Behind . . .Tuesday, October 11, 2011
  • 36. When Moving to the Cloud, Leave Old Ways Behind . . . Generic forklift is generally a mistakeTuesday, October 11, 2011
  • 37. When Moving to the Cloud, Leave Old Ways Behind . . . Generic forklift is generally a mistake Adapt development, deployment, and management models appropriatelyTuesday, October 11, 2011
  • 38. When Moving to the Cloud, Leave Old Ways Behind . . . Generic forklift is generally a mistake Adapt development, deployment, and management models appropriatelyTuesday, October 11, 2011
  • 39. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.htmlTuesday, October 11, 2011
  • 40. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html Perforce SCMTuesday, October 11, 2011
  • 41. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html Continuous Integration Jenkins Perforce SCMTuesday, October 11, 2011
  • 42. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html Continuous Integration Jenkins Perforce Artifactory SCM Binary RepositoryTuesday, October 11, 2011
  • 43. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Continuous Packages and Integration Configuration Jenkins Yum Perforce Artifactory SCM Binary RepositoryTuesday, October 11, 2011
  • 44. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Continuous Packages and Integration Configuration Jenkins Yum Perforce Artifactory Bakery SCM Binary Combine Base and Repository App-Specific ConfigurationTuesday, October 11, 2011
  • 45. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Customized, Continuous Packages and Cloud-Ready Integration Configuration Image Jenkins Yum AMI Perforce Artifactory Bakery SCM Binary Combine Base and Repository App-Specific ConfigurationTuesday, October 11, 2011
  • 46. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Customized, Continuous Packages and Cloud-Ready Integration Configuration Image Jenkins Yum AMI Perforce Artifactory Bakery ASG SCM Binary Combine Base and Dynamic Repository App-Specific Scaling ConfigurationTuesday, October 11, 2011
  • 47. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Customized, Continuous Packages and Cloud-Ready Integration Image Live System! Configuration Jenkins Yum AMI Instance Perforce Artifactory Bakery ASG SCM Binary Combine Base and Dynamic Repository App-Specific Scaling ConfigurationTuesday, October 11, 2011
  • 48. Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html App-Specific Customized, Continuous Packages and Cloud-Ready Integration Image Live System! Configuration Jenkins Yum AMI Instance Perforce Artifactory Bakery ASG SCM Binary Combine Base and Dynamic Repository App-Specific Scaling Configuration Every change is a new pushTuesday, October 11, 2011
  • 49. ResultsTuesday, October 11, 2011
  • 50. Results • No changes to running systemsTuesday, October 11, 2011
  • 51. Results • No changes to running systems • No CMDBTuesday, October 11, 2011
  • 52. Results • No changes to running systems • No CMDB • No systems management infrastructureTuesday, October 11, 2011
  • 53. Results • No changes to running systems • No CMDB • No systems management infrastructure • Fewer logins to prod systemsTuesday, October 11, 2011
  • 54. Impact on SecurityTuesday, October 11, 2011
  • 55. Impact on Security • File integrity monitoringTuesday, October 11, 2011
  • 56. Impact on Security • File integrity monitoring • User activity monitoringTuesday, October 11, 2011
  • 57. Impact on Security • File integrity monitoring • User activity monitoring • Vulnerability managementTuesday, October 11, 2011
  • 58. Impact on Security • File integrity monitoring • User activity monitoring • Vulnerability management • Patch managementTuesday, October 11, 2011
  • 59. APIs, Automation, and the Security MonkeyTuesday, October 11, 2011
  • 60. Common Challenges for Security EngineersTuesday, October 11, 2011
  • 61. Common Challenges for Security Engineers • Lots of data from different sources, in different formatsTuesday, October 11, 2011
  • 62. Common Challenges for Security Engineers • Lots of data from different sources, in different formats • Too many administrative interfaces and disconnected systemsTuesday, October 11, 2011
  • 63. Common Challenges for Security Engineers • Lots of data from different sources, in different formats • Too many administrative interfaces and disconnected systems • Too few options for scalable automationTuesday, October 11, 2011
  • 64. Enter the Cloud . . .Tuesday, October 11, 2011
  • 65. How do you . . .Tuesday, October 11, 2011
  • 66. How do you . . . • Add a user account?Tuesday, October 11, 2011
  • 67. How do you . . . • Add a user account? • Inventory systems?Tuesday, October 11, 2011
  • 68. How do you . . . • Add a user account? • Inventory systems? • Change a firewall config?Tuesday, October 11, 2011
  • 69. How do you . . . • Add a user account? • Inventory systems? • Change a firewall config? • Snapshot a drive for forensic analysis?Tuesday, October 11, 2011
  • 70. How do you . . . • Add a user account? • Inventory systems? • Change a firewall config? • Snapshot a drive for forensic analysis? • Disable a multi-factor authentication token?Tuesday, October 11, 2011
  • 71. How do you . . . • Add a user account? • CreateUser() • Inventory systems? • Change a firewall config? • Snapshot a drive for forensic analysis? • Disable a multi-factor authentication token?Tuesday, October 11, 2011
  • 72. How do you . . . • Add a user account? • CreateUser() • Inventory systems? • DescribeInstances() • Change a firewall config? • Snapshot a drive for forensic analysis? • Disable a multi-factor authentication token?Tuesday, October 11, 2011
  • 73. How do you . . . • Add a user account? • CreateUser() • Inventory systems? • DescribeInstances() • Change a firewall config? • AuthorizeSecurityGroup Ingress() • Snapshot a drive for forensic analysis? • Disable a multi-factor authentication token?Tuesday, October 11, 2011
  • 74. How do you . . . • Add a user account? • CreateUser() • Inventory systems? • DescribeInstances() • Change a firewall config? • AuthorizeSecurityGroup Ingress() • Snapshot a drive for forensic analysis? • CreateSnapshot() • Disable a multi-factor authentication token?Tuesday, October 11, 2011
  • 75. How do you . . . • Add a user account? • CreateUser() • Inventory systems? • DescribeInstances() • Change a firewall config? • AuthorizeSecurityGroup Ingress() • Snapshot a drive for forensic analysis? • CreateSnapshot() • Disable a multi-factor • DeactivateMFADevice() authentication token?Tuesday, October 11, 2011
  • 76. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.htmlTuesday, October 11, 2011
  • 77. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.html • Leverages cloud APIsTuesday, October 11, 2011
  • 78. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.html • Leverages cloud APIs • Centralized framework for cloud security monitoring and analysisTuesday, October 11, 2011
  • 79. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.html • Leverages cloud APIs • Centralized framework for cloud security monitoring and analysis • Certificate and cipher monitoringTuesday, October 11, 2011
  • 80. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.html • Leverages cloud APIs • Centralized framework for cloud security monitoring and analysis • Certificate and cipher monitoring • Firewall configuration checksTuesday, October 11, 2011
  • 81. Security Monkey http://techblog.netflix.com/2011/07/netflix-simian-army.html • Leverages cloud APIs • Centralized framework for cloud security monitoring and analysis • Certificate and cipher monitoring • Firewall configuration checks • User/group/policy monitoringTuesday, October 11, 2011
  • 82. Cloud Firewall and Connectivity AnalysisTuesday, October 11, 2011
  • 83. Analyzing Traditional FirewallsTuesday, October 11, 2011
  • 84. Analyzing Traditional Firewalls • Positioned at network chokepoints, providing optimal internetwork visibilityTuesday, October 11, 2011
  • 85. Analyzing Traditional Firewalls • Positioned at network chokepoints, providing optimal internetwork visibility • Use tools like tcpdump, NetFlow, centralized logging to gather dataTuesday, October 11, 2011
  • 86. Analyzing Traditional Firewalls • Positioned at network chokepoints, providing optimal internetwork visibility • Use tools like tcpdump, NetFlow, centralized logging to gather data • Review traffic patterns and optimizeTuesday, October 11, 2011
  • 87. AWS Firewalls (Briefly)Tuesday, October 11, 2011
  • 88. AWS Firewalls (Briefly) • “Security Group” is unit of measure for firewallingTuesday, October 11, 2011
  • 89. AWS Firewalls (Briefly) • “Security Group” is unit of measure for firewalling • Policy-driven and network-agnostic, configuration follows an instanceTuesday, October 11, 2011
  • 90. AWS Firewalls (Briefly) • “Security Group” is unit of measure for firewalling • Policy-driven and network-agnostic, configuration follows an instance • Network diagram irrelevantTuesday, October 11, 2011
  • 91. AWS Firewalls (Briefly) • “Security Group” is unit of measure for firewalling • Policy-driven and network-agnostic, configuration follows an instance • Network diagram irrelevant • Chokepoints and sniffing are not possibleTuesday, October 11, 2011
  • 92. AWS Firewalls (Briefly) • “Security Group” is unit of measure for firewalling • Policy-driven and network-agnostic, configuration follows an instance • Network diagram irrelevant • Chokepoints and sniffing are not possible • Outbound connections not filterable (!)Tuesday, October 11, 2011
  • 93. Security Group AnalysisTuesday, October 11, 2011
  • 94. Security Group Analysis • Use config and inventory to map reachabilityTuesday, October 11, 2011
  • 95. Security Group Analysis • Use config and inventory to map reachability • Leverage APIs to evaluate reachability and detect violations:Tuesday, October 11, 2011
  • 96. Security Group Analysis • Use config and inventory to map reachability • Leverage APIs to evaluate reachability and detect violations: • Security groups with no membersTuesday, October 11, 2011
  • 97. Security Group Analysis • Use config and inventory to map reachability • Leverage APIs to evaluate reachability and detect violations: • Security groups with no members • “Insecure” services (e.g. Telnet, FTP)Tuesday, October 11, 2011
  • 98. Security Group Analysis • Use config and inventory to map reachability • Leverage APIs to evaluate reachability and detect violations: • Security groups with no members • “Insecure” services (e.g. Telnet, FTP) • Rules that use “any” keywordTuesday, October 11, 2011
  • 99. Security Group Analysis • Use config and inventory to map reachability • Leverage APIs to evaluate reachability and detect violations: • Security groups with no members • “Insecure” services (e.g. Telnet, FTP) • Rules that use “any” keyword • Visualize config into data flow diagramTuesday, October 11, 2011
  • 100. Reachability & Violation AnalysisTuesday, October 11, 2011
  • 101. Connectivity AnalysisTuesday, October 11, 2011
  • 102. Connectivity Analysis • Reachability shows what “can” communicateTuesday, October 11, 2011
  • 103. Connectivity Analysis • Reachability shows what “can” communicate • What about what is communicating?Tuesday, October 11, 2011
  • 104. Connectivity Analysis • Reachability shows what “can” communicate • What about what is communicating? • Take same approach, leverage APIs for firewall and inventory and combine with host dataTuesday, October 11, 2011
  • 105. Connectivity Analysis • Reachability shows what “can” communicate • What about what is communicating? • Take same approach, leverage APIs for firewall and inventory and combine with host data • Visualize data into connectivity diagramTuesday, October 11, 2011
  • 106. Connectivity AnalysisTuesday, October 11, 2011
  • 107. ‘Practical’ Cloud Security GapsTuesday, October 11, 2011
  • 108. Common Security Product ModelTuesday, October 11, 2011
  • 109. Common Security Product Model • Examples - AV, FIM, etc.Tuesday, October 11, 2011
  • 110. Common Security Product Model • Examples - AV, FIM, etc. • “Management” station with client “nodes”Tuesday, October 11, 2011
  • 111. Common Security Product Model • Examples - AV, FIM, etc. • “Management” station with client “nodes” • Limited tagging or abstractionTuesday, October 11, 2011
  • 112. Common Security Product Model • Examples - AV, FIM, etc. • “Management” station with client “nodes” • Limited tagging or abstraction • Strong “manager” and “managed” modelTuesday, October 11, 2011
  • 113. Common Security Product Model • Examples - AV, FIM, etc. • “Management” station with client “nodes” • Limited tagging or abstraction • Strong “manager” and “managed” model • Push and pull approachesTuesday, October 11, 2011
  • 114. Common Security Product Model • Examples - AV, FIM, etc. • “Management” station with client “nodes” • Limited tagging or abstraction • Strong “manager” and “managed” model • Push and pull approaches • Per node licensingTuesday, October 11, 2011
  • 115. “Thundering Herd”Tuesday, October 11, 2011
  • 116. “Thundering Herd” • Mass deploymentsTuesday, October 11, 2011
  • 117. “Thundering Herd” • Mass deployments • “Red/Black” push - concurrent clusters of 500+ nodesTuesday, October 11, 2011
  • 118. “Thundering Herd” • Mass deployments • “Red/Black” push - concurrent clusters of 500+ nodes • Elasticity related to traffic spikesTuesday, October 11, 2011
  • 119. “Thundering Herd” • Mass deployments • “Red/Black” push - concurrent clusters of 500+ nodes • Elasticity related to traffic spikes • Licensing constraintsTuesday, October 11, 2011
  • 120. Node Ephemerality and Service AbstractionTuesday, October 11, 2011
  • 121. Node Ephemerality and Service Abstraction • Data related to individual nodes becomes less importantTuesday, October 11, 2011
  • 122. Node Ephemerality and Service Abstraction • Data related to individual nodes becomes less important • Dealing with short-lived systems, IP and ID reuseTuesday, October 11, 2011
  • 123. Node Ephemerality and Service Abstraction • Data related to individual nodes becomes less important • Dealing with short-lived systems, IP and ID reuse • Event and log archives and data relationshipsTuesday, October 11, 2011
  • 124. Resource Usage Logging and AuditingTuesday, October 11, 2011
  • 125. Resource Usage Logging and Auditing • Public-facing APIs make access controls more difficult and more importantTuesday, October 11, 2011
  • 126. Resource Usage Logging and Auditing • Public-facing APIs make access controls more difficult and more important • Programmable infrastructure needs robust logging and auditing capabilitiesTuesday, October 11, 2011
  • 127. Resource Usage Logging and Auditing • Public-facing APIs make access controls more difficult and more important • Programmable infrastructure needs robust logging and auditing capabilities • Can metering data be repurposed?Tuesday, October 11, 2011
  • 128. Identity IntegrationTuesday, October 11, 2011
  • 129. Identity Integration • Federation use casesTuesday, October 11, 2011
  • 130. Identity Integration • Federation use cases • On-instance credentialsTuesday, October 11, 2011
  • 131. “Trusted Cloud”Tuesday, October 11, 2011
  • 132. “Trusted Cloud” • Various components related to providing higher assurance/trust levels in the cloudTuesday, October 11, 2011
  • 133. “Trusted Cloud” • Various components related to providing higher assurance/trust levels in the cloud • Virtual TPM / hardware root of trustTuesday, October 11, 2011
  • 134. “Trusted Cloud” • Various components related to providing higher assurance/trust levels in the cloud • Virtual TPM / hardware root of trust • Controlled executionTuesday, October 11, 2011
  • 135. “Trusted Cloud” • Various components related to providing higher assurance/trust levels in the cloud • Virtual TPM / hardware root of trust • Controlled execution • HSM in the cloudTuesday, October 11, 2011
  • 136. Thanks! Questions? chan@netflix.com (I’m hiring!)Tuesday, October 11, 2011
  • 137. References • http://www.slideshare.net/adrianco • http://aws.amazon.com • http://techblog.netflix.com • http://nordsecmob.tkk.fi/Thesisworks/Soren %20Bleikertz.pdf • https://cloudsecurityalliance.org/ • http://www.nist.gov/itl/cloud/index.cfmTuesday, October 11, 2011