Playbook_Ultimate Guide to Incident Response in Azure.pdf

Ultimate Guide to
Incident Response
in Azure

Table of Contents
Introduction 2
Azure Incident Response Planning 4
Before the Incident 4
Post Incident 6
Service Specific Advice & Tools 9
Investigating Active Directory 9
Investigating Virtual Machines 9
Investigating Azure Kubernetes Service (AKS) 10
Open Source Tools 11
Native Azure Tools 12
Cado Security Tools 12
Further Reading 12
For More Information 12

Introduction
Investigating and responding to incidents in cloud environments like Azure is
fundamentally different to on-premise. Further, without the right tools and processes in
place, it can be more complicated. There are over 200 products and services in Azure,
each with different security best practices and data sources. While the cloud can make
incident response more complex, it also enables some fantastic possibilities. For
example, by leveraging cloud resources to collect, process and store evidence, you can
expedite the end-to-end incident response process in ways that would be unthinkable
on-premise.
This playbook offers recommendations to help security teams get to
this ideal state including:
● How to best prepare for incidents (which ultimately enable more efficient
response).
● Specific advice and playbooks around how to respond to threats identified in the
most common Azure services.
Azure’s incident response advice mentions two critical components to consider when
measuring how well your organization is prepared to reduce risk: mean time to
acknowledge (MTTA) and mean time to remediate (MTTR). The best practices
outlined in this playbook were crafted with these two key metrics in mind with the goal of
yielding noticeable improvement in both.

Azure Incident Response Planning
Before the Incident
The following best practices can help security teams reduce the likelihood that an
incident will occur, and in the event that it does, drastically decrease recovery time.
Know Your Data
Identify your crown jewels. Do you have particularly sensitive information,
like Personally Identifiable Information (PII) or Payment Card Industry (PCI)
data? If so, you need to know exactly where it lives and what systems
process the data. This also includes any backups or logs that might shadow
the original data.
Have Backups, And Test They Work
A disaster recovery plan can mitigate not just security incidents like
ransomware, but also other likely events such as data center hardware
failure. Ransomware is a high risk due to both high impact and relatively
high likelihood of occurrence.
Restrict Administrative Accounts
In general, follow the principle of least privilege. In particular, Microsoft
provides detailed advice on how to secure administrative accounts in Azure
AD.
Require Multi-Factor Authentication for all User Accounts
This can be easily enabled by following this guide.
Review Azure Security Center Settings
Azure Security Center is a centralized view of both security issues and
configuration options. Unfortunately, many of the most useful features need
to be enabled (at cost) in advance of any breach.
Limit Network and Remote Access
Limit any connectivity to the internet from your machines as much as
possible. A common security issue in Azure is Windows machines with RDP
accessible from the internet. This can put you at particular risk of brute-force
ransomware attacks.

Encryption
The general advice is to ensure data is always encrypted at rest and in
transit. There are open discussions around how useful encrypting data at
rest is with some cloud services. However, you may have particular
requirements here if you are in a regulated industry such as finance or
healthcare.
Enable Logging
“Forensic readiness” will help you not only detect incidents earlier but also
make investigations more thorough and efficient. As you can imagine, the
more useful data you have, the more likely you will be able to find the root
cause of an incident. Ensuring you have the right logs enabled can make all
the difference.
Azure has a number of different logs, including:
● Activity Logs: Management events against your subscription e.g., creating
a Virtual Machine. Retrieve from the Azure Monitor>Activity Log Service.
● Resource Logs: Data plane events, for example, retrieving a key from a
store. Enabled from Diagnostic settings.
● Azure Active Directory Logs: User events and other things generally
operated by AD. Enabled from AD > Diagnostic Settings.
● Windows Azure Diagnostics: Logs collected from inside the host. These
can be forwarded to your SIEM.
● Application Logs: General application health and performance.
● Storage Analytics Logs: Specific to the storage service.
● Network Security Group Flow Logs: Typical minimal flow logs
● Security Center: Alarms from potentially malicious events
Both Data Dog and Secure Works have great tutorials on how to ensure full logging is
enabled.

Be Prepared
Periodically run tabletop exercises to simulate incidents and build muscle
memory across both executive and operational teams.
Executives should be prepared to answer the following questions in
advance of any incident:
● Under what circumstances do you notify law enforcement, regulatory
authorities, auditors and the board?
● Will we pay a ransom? If so, how?
● If required, which outsourced incident response firm will you work with?
● If you lose access to core IT systems for an extended period of time? Do
you have business continuity and disaster recovery plans in place?
● If the primary communication methods are either unavailable or
compromised, do you have backup or out-of-band communications
available?
● What working hours are incident responders expected to work in a
high-severity incident?
● Do you have access to the data required to perform an investigation in all
products and services?

Post-Incident
Gather the Incident Response Team
As part of incident response planning, organizations should craft a
well-thought-out and rehearsed incident response and crisis
communications plan with defined roles and responsibilities mapped out to
limit the overall impact should an incident occur. This includes preparing
internal teams and external incident response service providers in steps to
take and actually exercising the plan end to end regularly. Ideally, this plan
has been pre-approved during incident response planning so that incident
response actions can kick off as soon as possible post-incident
identification.
The roles in an incident response team will vary depending on both the size of your
team and the scale of the incident. Most often, one person will take on a number of
roles. A typical example of the roles in an incident response team is:
● Leadership role - Commands the investigation and directs activities.
● Investigator role - Identifies incident root cause and the full scope of
compromised systems and data.
● Responder role - Works with internal teams and 3rd parties to recover
and restore systems and services and plan and coordinate remediation
steps.
● Documentation role - Enables the investigation, remediation and
potentially legal representation. The legal representation may also be
handled by inside or outside counsel (though only a small number of
incidents end up bringing in a legal representative).

Understand the Environment
It is important to gain an understanding of the environment in which the
incident occurred. If you are an internal SOC, you may already know the
answers to these questions in advance of an incident:
● Where is sensitive data stored?
● How are users connected to Azure Active Directory?
● Who are the administrators?
● Where are logs stored?
● What Azure Products and Services are in use?
● Is Active Directory connected to On-Premise or Microsoft 365?
There are a number of tools that can answer these questions automatically—see
the Tools section below.
Collect the Right Data
You can’t investigate what you don’t have access to, so it’s important to
ensure you have the right access and you know which data sources will be
most valuable in your investigation prior to an incident occurring. Once an
incident is detected, it’s important to collect information relating to systems
that may be compromised, including both meta-data (typically logs) and
full content (disk images, volatile data, etc.). You will need to carefully
scope this phase as it can prove difficult to find the right balance between
collecting too much and collecting too little. A thorough investigation can
require a lot of data - so collecting data in phases to gradually narrow the
scope of your investigation is important. For example, it’s useful to perform
an initial triage collection across a larger set of systems to determine
which systems require a more in-depth full disk analysis to increase
overall efficiency while getting the answers you need.

Cado Response can automatically collect and analyze full copies of most data sources
in Azure, AWS and GCP with a single click, see the tools section for more information.
Perform the Investigation
First, identify the scope of the investigation by answering the following
questions:
● Do you just need to recover services?
● Do you need to identify the root cause of the incident so it doesn’t
happen again?
Most investigations start with a suspicious event - such as a detection for
malware on a system. And then the investigation progresses as you pivot
based on timestamps or key findings and artifacts. For example:
● What other events happened just before or after the known bad
event?
● Are there other suspect files in the same folder?
● Are other systems connected to known bad events or known
compromised systems somehow?
Below we provide suggested investigative steps based on the Azure
service involved, the type of incident, and recommendations on tools that
may be useful.

Investigation Playbooks
Microsoft also provides playbooks for particular scenarios in Azure:
● Phishing Investigation
● Password Spray Investigation
● Ransomware Attack
● App Consent Grant
● Compromised or Malicious Application
● Forensic / Legal Investigation
Containment & Remediation
During the containment phase of an incident, some questions that will be
important to answer include:
● Can you limit the damage before it gets worse?
● Do you need to isolate virtual machines or services?
● Can you permanently bring the environment back to a safe state?
● If you have identified the root cause, can you fix the original issue?
If not, can you mitigate the risk with other preventative technology
or additional monitoring to identify future use?
● Have you hunted for other potential compromises? For example, by
importing key systems and scanning for malware.
● Have you reviewed the best practices above and confirmed if any
need to be implemented?
● Have you enabled additional monitoring where gaps have been
identified?
● Have you documented all findings and actions taken?
● Do you need to publish an incident report?
● Have you identified lessons learned and conducted a wrap-up
meeting?

Service Specific Advice & Tools
Investigating Active Directory
Azure Active Directory (Azure AD) is Microsoft’s cloud-based identity and access
management service. It combines core directory services, application access
management, and identity protection into a single solution. It enables single sign-on and
multi-factor authentication to help protect users from password fatigue and phishing
attacks. It also provides group management and device management capabilities.
When responding to an incident:
● Identify highly privileged users by using the Azure Portal and Azure Graph
● Identify which applications AD provides authentication for
● Identify and deactivate potentially compromised user accounts
● Identify and disable legacy authentication methods
Will Oram has made a great guide on how to specifically respond to incidents involving
Azure Active Directory.
Investigating Virtual Machines
Azure Virtual Machines (VMs) are a cloud computing service from Microsoft that
enables users to create, configure, and manage virtual machines in the cloud. VMs can
be created from pre-configured images or from scratch and can be configured to run a
variety of operating systems and applications. Azure VMs are available in a variety of
sizes and can be scaled up or down to meet changing computing needs.

Azure provides the functionality to export the disk images of Virtual Machines in VHD
format for forensic analysis. This can be done by selecting the disk, then selecting
Create Snapshot. This can also be done on the command line using the az snapshot
create command:
az snapshot create --name
--resource-group
[--accelerated-network {false, true}]
[--architecture {Arm64, x64}]
[--copy-start {false, true}]
[--disk-access]
[--disk-encryption-set]
[--edge-zone]
[--encryption-type {EncryptionAtRestWithCustomerKey,
EncryptionAtRestWithPlatformAndCustomerKeys, EncryptionAtRestWithPlatformKey}]
[--for-upload {false, true}]
[--hyper-v-generation {V1, V2}]
[--incremental {false, true}]
[--location]
[--network-access-policy {AllowAll, AllowPrivate, DenyAll}]
[--no-wait]
[--public-network-access {Disabled, Enabled}]
[--size-gb]
[--sku {Premium_LRS, Standard_LRS, Standard_ZRS}]
[--source]
[--source-storage-account-id]
[--tags]
The Cado Response platform can import the full disk images of Azure Virtual Machines
and process and analyze them, including running threat intelligence against the
contents and indexing file contents:

Investigating Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS) is a managed Kubernetes service that lets you quickly
deploy and manage containerized applications in the cloud. AKS reduces the
complexity and operational overhead of managing Kubernetes by offloading much of
that responsibility to the Azure cloud. As a hosted Kubernetes service, AKS is quickly
becoming a popular choice for developers and enterprises that want to deploy
applications in containers.
Cado Response can collect the full contents of containers running on AKS by retrieving
a copy of the container disk or files over the Kubernetes Control plane using Cado Host:

Open Source Tools
The community has created a number of tools that may be of use when responding to
incidents in Azure:
● Azure AD Incident Response PowerShell Module
● Sparrow (Identifies compromised accounts in AD)
● Mandiant Azure AD Investigator
● Azure Hound (Collects data from Azure)
● Hawk (Retrieves data for 365 Investigations)
● CrowdStrike Reporting Tool for Azure (Identifies possible issues)
● Cloud Forensic Utils (Retrieves forensic data from Virtual Machines)
Native Azure Tools
Microsoft provides advice on how to use the following platforms to investigate security
incidents:
● Azure Security Center
● Azure Sentinel (Automation Tutorial)
● Defender (Part Two)

Cado Security Tools
Cado Security’s Platform can automate the end to end incident response in Azure
environments. We offer both a free trial and a free community edition.
Further Reading
If you are responding to incidents in Azure, you may find the following resources useful:
● Security Best Practices by Microsoft
● Azure Security Response in the Cloud by Microsoft
● Incident Response Reference Guide by Microsoft & EY
● Azure AD Incident Response by Will Oram at PwC
Microsoft provides a number of lists and checklists of best practices for security
in Azure:
● Top 10 Azure security best practices
● Azure security best practices and patterns
● Azure Operational Security best practices
● Security best practices for Azure solutions

Playbook_Ultimate Guide to Incident Response in Azure.pdf

Recommended

Recommended

More Related Content

More from Christopher Doman

More from Christopher Doman (20)

Playbook_Ultimate Guide to Incident Response in Azure.pdf