Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Incident Response in the Cloud - SID319 - re:Invent 2017

2,655 views

Published on

In this session, we walk you through a hypothetical incident response managed on AWS. Learn how to apply existing best practices as well as how to leverage the unique security visibility, control, and automation that AWS provides. We cover how to set up your AWS environment to prevent a security event and how to build a cloud-specific incident response plan so that your organization is prepared before a security event occurs. This session also covers specific environment recovery steps available on AWS.

Incident Response in the Cloud - SID319 - re:Invent 2017

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:Invent Incident Response in the Cloud J i m J e n n i s – P r i n c i p a l S o l u t i o n s A r c h i t e c t , A W S C o n r a d F e r n a n d e s , C l o u d C y b e r S e c u r i t y L e a d , J H U A P L N o v e m b e r 3 0 , 2 0 1 7 S I D 3 1 9
  2. 2. Incident Response in the Cloud • Basics – Events and incidents – Things that “go bump” • Incident response in the cloud – Best practices • Case study: JHUAPL – Information spillage incident response
  3. 3. Events and Incidents So, What’s the Difference? All incidents are events—but all events are NOT incidents Via “event management” we monitor (ex: use tools like Amazon CloudWatch, AWS CloudTrail, Splunk, and others, to track, monitor, analyze and audit EVENTS) If event management identifies an event that is analyzed and qualified as an incident, that “qualifying event” will trigger the registration of an Incident and trigger the incident management process and any response actions (where required) will be initiated
  4. 4. What Is an Incident? An unplanned interruption to an IT service or reduction in the quality of an IT service Failure of a configuration item that has not yet affected service is also an incident
  5. 5. Incident Response in the Cloud Bottom line up front: Incident response can be complex but there is NO ROCKET SCIENCE INVOLVED OR NEEDED!
  6. 6. Certifications and accreditations for different regimes Security is a shared responsibility Ownership and Control
  7. 7. Domains and Scope of Incident Response Understanding the Interfaces and Boundaries • Incident response of the customer “in the cloud” • Incident response of the CSP for the cloud infrastructure and services it provides • Joint coordinated incident response of the customer and CSP in cooperation with one another • Joint coordinated incident response of multiple CSPs and/or customers in cooperation with each other
  8. 8. IR Policy, Process, Procedure Example Applicable and Governing Standards Example governance in the federal community NIST 800-53/NIST 800-171 NIST 800-61 FEDRAMP Incident Communications Procedure FEDRAMP Continuous Monitoring Strategy Guide CJCSM 6510.01B DoD 8530.1/8530.2 – DoD PKI (aka “CAC”) required Homework assignment
  9. 9. IR Stakeholders Your Involvement Your Involvement Your Involvement Your Involvement
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Governance Insert your policy, process, procedure Information Security & Identity Management Committee
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. For example: FedRAMP/NIST requires CSPs and implementers to develop an incident response plan that describes how they manage security incidents for the system and address the incident response (IR) family of security controls below: • IR-1 Incident Response Policy and Procedures • IR-2 Incident Response Training • IR-3 Incident Response Testing & Exercises • IR-4 Incident Handling • IR-5 Incident Monitoring • IR-6 Incident Reporting • IR-7 Incident Response Assistance • IR-8 Incident Response Plan • IR-9 Information Spillage Response Example Incident Response (IR) Controls
  12. 12. Incident Response in the Cloud Am I prepared for incidents and failures? Both are guaranteed to happen!
  13. 13. Topics and Best Practices • Is incident response in the cloud different? • Building your IR policies, governance, plans, and procedures/run books • Preparation, training, and execution • IR for continuous improvement • IR exercises – training the way you fight • Iterate and automate! • Engaging your CSP (AWS)
  14. 14. Best Practices Building Robust Incident Response in the Cloud
  15. 15. • Cloud is different and IR in the cloud is different! Easier, faster, cheaper, more effective! • Your ability to detect, react, and recover can be greatly enhanced by leveraging the cloud • Many capabilities for investigation are ONLY possible in the cloud • IR in the cloud is NOT just about being reactive! Incident Response – Cloud Considerations
  16. 16. Leveraging IR to improve your security posture in the cloud Let’s look at some best practices…
  17. 17. Integrate Incident Response with Continuous Improvement Establish control Determine impact Recover as needed Investigate root cause Implement improvement Iterate! Think: NTSB incident/accident investigation and recommendations but geared to your AWS environment!
  18. 18. Preparation – Being Proactive! • Architect for failure and IR throughout • Implement clear, lightweight governance and ownership. • Architect and build for speed, agility, security, and integrity • Implement clear, simple controls and run books for responders • Leverage principle of “least privilege” throughout • Validate readiness and run tests continually • Consider “chaos engineering”
  19. 19. Clear Ownership and Governance • Tools to identify resources and find owners and administrators – Tags are your friends...remember the power of a mission focused tagging taxonomy rigorously enforced! • Procedures to engage owners and administrators • Procedures to engage your CSP (AWS) • Don’t create policies and procedures you are not willing and able to enforce!
  20. 20. Take Advantage of Your CSP for IR in the Cloud • “How can I leverage all the tools the CSP (AWS) makes available to me?” • “When do I need to engage my CSP (AWS) for support?”
  21. 21. Apply DEV-SEC-OPS to Incident Response… • Leverage what you ALREADY DO WELL! • Start small and grow incrementally • Build an IR “flight simulator” in the Cloud • Schedule IR scenario planning and prioritization sessions • Run your first incident response simulation (IRS) in the cloud—AWS can help! Iterate and improve! Build/run another IRS…improve Build/run another IRS…improve…repeat
  22. 22. Building an IRS Scenario Catalog Incident Response Simulation How To • Identify an issue of importance (historical or “What if?”) • Leverage skilled users, security, and operations people • Build a realistic simulation • Invite other stakeholders • Run the simulation live • Complete an after action “hot wash” • Identify how to improve and repeat! • AWS is here to help!
  23. 23. • Real time metrics/automation—everywhere! • Lightweight governance with delegation of decision- making/enforcement • Develop thresholds for security engagement in support processes • Develop rapid security escalations for access • Utilize secure communications for incidents with ability to verify and authorize actions Good Incident Response Is NO ACCIDENT! Build for Speed, Agility, and Security… Use the Ecosystem!
  24. 24. AWS Config CloudWatch/ CloudWatch Logs CloudWatch alarms AWS CloudTrail Amazon EC2 OS logs Amazon VPC Flow Logs Amazon SNS Email notification HTTP/S notification SMS notifications Mobile push notifications API calls from/for most services Monitoring data from AWS services Custom metrics Logs→Metrics→Alerts→Actions Amazon SQS AWS Lambda Lambda function
  25. 25. When to Engage AWS? Engage AWS Support any time an event may be occurring that affects your ideal operational state
  26. 26. When Do I Contact AWS Security? Obtaining permission to perform penetration testing/scanning Reporting security vulnerabilities Reporting suspicious emails Reporting abuse of AWS resources
  27. 27. Engaging Support
  28. 28. ITIL Role AWS Role Responsibilities Incident Analyst Cloud Support Engineer (CSE) ● Initial support and classification of concerns ● Owns issues, monitors, tracks, and communicates during the issue management process ● Resolves and supports recovery of concerns not requiring escalation to an AWS service subject matter expert ● Escalates concerns to the AWS service subject matter experts (as required) ● Can close issue-related cases when consensus is reached with the customer Incident Manager Technical Account Manager (TAM) ● Monitors issue details from an AWS internal perspective on the customer’s behalf ● Investigates and diagnoses concerns, as well coordination between the customer, AWS Cloud Support Engineers, and AWS subject matter experts. These engagements can be via videoconference, telephone conference, or any method the customer chooses. ● Monitors customer-requested escalations for concerns and providing a conduit for customers to engage internal AWS subject matter experts to meet the objectives of the issue management process ● Drives the efficiency and effectiveness of the AWS issue management process ● Produces customer-specific management information such as metrics, reports, and so on ● Records out-of-scope/intent issues related to service design for consideration toward future service releases and improvements Subject Matter Expert AWS Service Subject Matter Expert ● Analyzes concerns to identify service restoration actions to be taken ● Conducts event resolution actions to restore services to customers ● Assists issue management staff with assessing the impact of any events Engaging Human Support
  29. 29. Literally Go Here… https://aws.amazon.com/contact-us/
  30. 30. Good IR Is NO ACCIDENT! Build for Speed, Agility, and Security… Use the Ecosystem! • Build securely and verify before deployment (provisioning enclave) • Build in monitoring, metrics, alerts, and messaging • Proactively analyze and preserve data - Resource configs, logs, volatile memory, snapshots • Build forensics AMIs, SGs, storage, and isolated subnets • Build for rapid recovery—automate! • Regularly run incident response simulations (IRS)—iterate and improve! • Incidents DO NOT HAVE TO BE DISASTERS!
  31. 31. Common Objections • Running IR simulations is expensive and high risk…we can’t afford to do “live fire” exercises! • “I am an understaffed, interrupt driven ops organization. I do not have time for drills.” • “What if we fail? We could look bad.”
  32. 32. Why You Should Do It… • If you already do it…just include your cloud! • Helps you understand your AWS environment! • Augments training and readiness—troops fight like they train • Fixes real issues and helps build a culture of continuous improvement • Helps build your own expertise and improve response • Helps meet your security requirements • Cloud allows you to execute quickly and economically • Can you afford not to?
  33. 33. AWS Security Resources AWS Compliance https://aws.amazon.com/compliance/ AWS Security Blog http://blogs.aws.amazon.com/security/ AWS Security Center https://aws.amazon.com/security Contact the AWS security team aws-security@amazon.com
  34. 34. Other Incident Response Resources SANS Reading Room, Incident Response http://www.sans.org/reading-room/whitepapers/incident FIRST http://www.first.org/resources/guides CERT, Incident Management http://www.cert.org/incident-management/publications/
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C o n r a d F e r n a n d e s A W S C S A - A , C I S S P , G C F A J o h n s H o p k i n s – A p p l i e d P h y s i c s L a b o r a t o r y APL Incident Response Incident and Spillage Response
  36. 36. Johns Hopkins Applied Physics Laboratory § Technically skilled and operationally oriented § Objective and independent § DoD § NASA § Critical contributions to critical challenges § DHS § IC § Division of Johns Hopkins University § University Affiliated Research Center
  37. 37. What APL Missions Require Reliable and elastic infrastructure Scalable computing and storage – Medical image processing, big- data analysis, machine learning … With agility! (noun, “ability to move quickly and easily”) Preconfigured and bootstrapped machine images, scripting, templates to build cloud infrastructure via automation … While maintaining security and governance Multifactor authentication, security groups, access controls, data encryption, secure monitoring, notifications, incident response … And compliance to laws and regulations for sensitive data FOUO/CUI (DoD) commercial and GovCloud, HIPAA (Medical)
  38. 38. What APL Cloud Team Provides § IT cloud team works closely with APL mission areas to provide cloud computing services and infrastructure § Designs and architects network and security enterprise wide § Creates the structure for security monitoring and incident response
  39. 39. IR-4 “Incident Handling” Comparing Incident Response Contexts IR-9 “Information Spillage Response” Life’s a breach! Cleanup on aisle 9! § [intention: usually inadvertent] § Identification: Always notified § Eradication: Fairly standard wipe (data sanitization) DoD processes § Follow-Up: Lots of official paperwork § [intention: usually malicious] § Identification: Difficult detection/evasive tactics, exploits § Eradication: Can be difficult to locate all footholds; incomplete § Follow-up: Lots of lessons learned
  40. 40. Incident Response Approach Preparation Identification Containment Investigation EradicationRecoveryFollow-Up * Applies to all types of IR, including IR-4 (breaches) and IR-9 (spills)
  41. 41. Preparation Identification Containment Investigation Eradication Recovery Follow-Up § Train incident handlers for responding to cloud specific events § Ensure logging is enabled § VPC Flow Flogs, Cloud Trail, AWS Config, Amazon Simple Notification Service (Amazon SNS) notifications § OS and application logs from Amazon EC2 instances § Collect and aggregate the logs centrally for correlation and analysis § Example: Amazon CloudWatch, Amazon ElasticSearch Service, or Security Information and Event Management – SIEM vendors (such as Splunk) If prevention is better than cure … preparation is better than eradication
  42. 42. Preparation Identification Containment Investigation Eradication Recovery Follow-Up § Use Amazon Elastic Block Store (Amazon EBS) encryption when creating EBS volumes § Equivalent to full disk encryption (FDE) on corporate laptops § Use Amazon Server-Side Encryption (SSE) for landing Amazon S3 objects § Amazon S3-managed keys (SSE-S3): easiest key management § AWS Key Management Service (AWS KMS)-managed keys (SSE- KMS): additional benefits § SSE with customer-provided keys (SSE-C): customer manages Encryption in preparation phase renders data as CIPHERTEXT - Huge advantage for spillage cleanup in eradication phase
  43. 43. Multi-account isolation and policy enforcement: “AWS organizations” § Enforces “separation of duties” principle § Limits the blast radius in the event of compromise § Organize accounts along business lines or mission areas § Use of overarching Service Control Policies (SCP) to control sub accounts with restrictive policies Preparation Identification Containment Investigation Eradication Recovery Follow-Up
  44. 44. AWS Orgs: Example Layout OUs, projects SCPs Business Project A Business Sector 1 Business Sector 2 Project A Project B Project C Business Sector 3 Business Sector 4 Business Sector 5 Business Sector 6
  45. 45. § Usually notified about which user accounts and systems have data that need “cleaning up” § Can use data loss prevention (DLP) or new service “Amazon Macie” § Open up spillage case # with AWS Business Support for cross-validation § Use behavioral based rules for detection and searching § CloudWatch rules § SIEM tools, such as Splunk for AWS, or AWS ElasticSearch (Kibana visualizations) IR-4 “Incident Handling” IR-9 “Information Spillage Response” Preparation Identification Containment Investigation Eradication Recovery Follow-Up Also known as “detection”
  46. 46. Identification: Example Using Splunk for AWS
  47. 47. IR-4 “Incident Handling” § Multiple use-cases for live-box and dead-box isolation and forensics § Investigation complex: correlation, threat intelligence, timeline analysis § Beyond the scope of this presentation IR-9 “Information Spillage Response” § Closer to live-box forensics § Investigation easier: usually limited to known users and host machines § Isolation using security group § Via console or automation for speed (see example below) Containment isolation: § Save the current security group of the host or instance § Isolate host using restrictive ingress and egress security group rules CLI> aws ec2 modify-instance-attribute --instance-id <instance-id> --groups "<Isolation-SG>" § Isolation-SG: only SSH (22) or RDP (3389) ingress rules with IR enclave as source. No egress Preparation Identification Containment Investigation Eradication Recovery Follow-Up
  48. 48. Investigation: Example using SIEM/Splunk for AWS
  49. 49. Preparation Identification Containment Investigation Eradication Recovery Follow-Up § If Amazon Elastic Block Store (Amazon EBS) encryption was used for volumes: § Delete the spilled file § Create a new encrypted volume, copying all the good files (minus the spillage) § Delete the affected encrypted volume and delete the key used to encrypt it § If Amazon Server-Side Encryption (SSE) was used for Amazon S3 objects: § If Amazon S3-managed keys (SSE-S3) were used: simply delete the object! § If AWS KMS-managed (SSE-KMS) or customer-provided (SSE-C) keys used: § Delete the file object and the customer master keys (CMKs) used to encrypt the object If Encryption was used during preparation … it’s as simple as deleting objects and keys
  50. 50. • Copy DoD (or authorized) sanitization tools to affected EC2 hosts IR-Net# scp –i “host-private-key” bcwipe.exe ec2-user@TargetHost. amazonaws.com:[root_volume/bcwipe.exe] • Remote connect to the host via SSH (port 22) or RDP (3389) to perform sanitization actions IR-Net# ssh -i “host-private-key” ec2-user@TargetHost.amazonaws.com • Once on target host, wipe files and slack, per authority (example: DoD 5220- 22M) AffectedHost# bcwipe <spilled file> Preparation Identification Containment Investigation Eradication Recovery Follow-Up If encryption was NOT USED during preparation … you may be able to sanitize EBS volumes only
  51. 51. Recovery § Restore network access to original state (prior to isolation) Restore previous security group ingress, egress rules CLI> aws ec2 modify-instance-attribute --instance-id <instance-id> --groups "<Original Security Group>" Preparation Identification Containment Investigation Eradication Recovery Follow-Up Follow-Up § Verify deletion of data encryption keys (if EBS or Amazon S3 encryption was used) § Cross-validate with Amazon Support Case # § Report spillage findings and response actions § In accordance with DoD 5220 or appropriate authorities
  52. 52. Takeaways § Understand the differences between IR-4 (threat based) and IR-9 (spills) and plan the handling and response accordingly § Use a phased approach for IR: Create well-defined steps and operational procedures, including training for the response teams § Preparation step is critical § Use encryption – for EBS volumes, Amazon S3 storage, and wherever possible § Use AWS organizations to separate projects/functions and limit the blast radius § Enable all critical logging mechanisms (EC2 OS, AWS CloudTrail, VPC FlowLogs) § Create detection rules in AWS CloudWatch, Amazon ES, or third-party SIEM § Use AWS CLI or SDKs especially for quick “containment”, such as using predefined restrictive security groups
  53. 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×