SlideShare a Scribd company logo
1 of 18
Dr. Anton Chuvakin Office of the CISO, Google Cloud
SOC, SecOps, Security
Operations Reminder
01
A security operations center provides centralized
and consolidated cybersecurity incident
prevention, detection and response capabilities.
Gartner
First Things First: A SOC is Still …
a SOC :-)
SOC is first a team.
Next a process.
And it uses
technology too.
2003 or 2023? Sec Ops is Ripe
for Transformation
We can’t store and analyze
all data, resulting in
blindspots
It’s cost prohibitive to
ingest all the data we need
It takes too long to
investigate alerts
We struggle to build
effective detection and
have too many false
positives/negatives
Our processes are too
manual, we are too slow to
respond to and remediate
threats
We don’t have enough
skilled engineers to make
everything work
SRE, DevOps
and Modern IT
02
BARD SAYS: What is the essence
of the DevOps revolution in IT?
DevOps is a set of practices that combines software development
(Dev) and IT operations (Ops). It aims to shorten the systems
development life cycle and provide continuous delivery with high
quality. DevOps is a culture, movement, or practice
that emphasizes the collaboration and communication of both
software developers and other information technology professionals
while automating the process of software delivery and infrastructure
changes.
The DevOps revolution in IT is about breaking down the silos
between development and operations teams and creating a more
collaborative and efficient way of working. It’s about automating
tasks, reducing manual errors, and improving communication and
collaboration.
DevOps can help organizations deliver software faster, more
reliably, and with higher quality. It can also help
organizations reduce costs and improve efficiency.
Google vs Enterprise “SecOps”
Problem What does Google do? What do most enterprises do?
Efficiency Automation/SRE is a mindset – part of the hiring
process, part of OKRs, and performance reviews
Experimenting with SOAR, full adoption is tough due
to minimal automation culture
Employee Shortage Requires coding interviews, high pay, attracts the
best, invests in growth
Hires traditional roles, no coding, rarely outsources,
less pay, less growth, more stress
Employee Burnout 40/40/20 between eng, operations, and learning Utilization is almost always >100%
Expensive Investment in efficiency solves for human costs Cost-prohibitive data ingestion, oftentimes paying
SIEM + DIY, increasing $ from complexity
Efficacy TI strongly embedded in D&R, mostly utilized
towards proactive work, strong collaboration across
Alphabet & benefits from developer hygiene
CTI team produces great reports, SOC doing fire
drills, >90% false positive rate, uneven distribution of
skill (Tier 3)
5 SRE Lessons for SOC
03
Let’s focus on
5 key areas today
Eliminate Toil
Use SLOs
Evolve Automation
Practice Release Engineering
Strive for Simplicity
Causes of Toil Less Gathering, More Analysis –
basics to automate
Key Activities To Implement
1. Too much technical debt
2. Priorities or goals are not aligned
3. Lack of training or support
4. Lack of collaboration
5. The business value to fix is too hard to
realize
● Gathering machine information
● Gathering user information
● Process executions
● All context needed to help get to final
(human) judgement
Activity
Train your team on toil & automation
Create an Automation Queue
Implement Blameless Postmortems
Conduct Weekly Incident Reviews
Implement SOAR
Hire Automation Engineer(s)
Implement CD/CR pipelines with metrics
Eliminate Toil
“...manual, repetitive, automatable, tactical, devoid of
enduring value, and that scales linearly as a service grows.”
01
● Analyst utilization gets optimized
● More creative work, less toil
● Time back to do more proactive work
● Deeper operationalization of intel
● SecOps can scale with the business!
Unit
costs
per
event
Evolve
Automation
02
10X is an Underestimate!
Use SLOs
Tips, gotchas, and core metrics to consider
03
Core metrics
Metric
event volume
event source counts
pipeline latency
triage time median
triage time at 95%
Incident resolution times
Common metrics (false positives, # of
incidents, etc)
Key tips Gotchas
1. Optimize metrics for optimal value
2. Manage with indicators + objectives
3. Metrics matter (in context)
4. Defeating attackers beats SLOs
5. Choose metrics that actually matter
6. Make your SLO’s open (within your
company)
● Fast =/= better – don’t incentivize
speed, incentivize thoroughness
● More =/= better – solving 5000 cases
manually is not better than automating
of that; #NoHeroes
Practice Release Engineering
04
Ad Hoc
Visibility
Significant Development
effort up front to implement
playbooks
Review playbooks when a
major problem occurs
Response
Orchestration
Security
Analytics
Significant Development
effort up front to implement
use cases
Add/Update detections in
response to major new threat
Onboard log sources as part
of major tech transformation
Review logs for new sources
when a problem comes up
Periodic
Quarterly review of playbook
performance and effectiveness
Dev Sprint to update playbooks
Quarterly review of detection efficacy
Update/Deprecate ineffective
detections
Add/Update detections in response to
major threat
Onboard log sources annual or
quarterly planned schedule
Review data monthly for new log
sources and to identify issues/outages
Continuous
Real-time alerts for detection efficacy drift
Update/Deprecate ineffective detections
at point of discovery
Active Threat Monitoring to proactively
identify new threats to build detections for
Onboard new log sources as they are
ready.
Real-time identification of new log sources
or log drops
Automatic creation of alerts for handling
Live Dashboards showing performance
and accuracy metrics for playbooks
Update/Deprecate ineffective playbooks
at point of discovery
Daily Review of SecOps work queues to
identify automation opportunities
"Complex systems require substantial human expertise in
their operation and management. This expertise changes in
character as technology changes but it also changes because
of the need to replace experts who leave. In every case,
training and refinement of skill and expertise is one part of the
function of the system itself. At any moment, therefore, a given
complex system will contain practitioners and trainees with
varying degrees of expertise.
Critical issues related to expertise arise from (1) the need to
use scarce expertise as a resource for the most difficult or
demanding production needs and (2) the need to develop
expertise for future use."
Human expertise in complex systems is
constantly changing
Strive for Simplicity
05
One consequence of not striving for simplicity
https://how.complexsystems.fail/
Actions
Reduce toil in your SOC -
shift toil to machines
Use SLOs / metrics
to drive change
Evolve automation in SIEM,
SOAR, threat intel, etc
Practice release engineering
for consistent improvement
Strive for simplicity with
processes, technology stack, etc
Improvement
The Power of
Continuous
Improvement
Exponential growth happens faster
when compounded more frequently
Organizing your people and processes
around continuous improvement means
more agility and less resources
Periodic improvement strategies leave
capability gaps between sprints Time
Resources
“Achieving Autonomic Security
Operations: Reducing toil”
“Achieving Autonomic Security
Operations: Why metrics matter (but
not how you think)”
“More SRE Lessons for SOC:
Release Engineering Ideas”
“Achieving Autonomic Security
Operations: Automation as a Force
Multiplier”
“More SRE Lessons for SOC:
Simplicity Helps Security”
safer together
thank you

More Related Content

What's hot

AI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey GordeychikAI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey Gordeychik
Sergey Gordeychik
 

What's hot (20)

AI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey GordeychikAI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey Gordeychik
 
DevSecOps and the CI/CD Pipeline
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD Pipeline
 
10X SOC - SANS Blue Summit Keynote 2021 - Anton Chuvakin
10X SOC - SANS Blue Summit Keynote 2021 - Anton Chuvakin10X SOC - SANS Blue Summit Keynote 2021 - Anton Chuvakin
10X SOC - SANS Blue Summit Keynote 2021 - Anton Chuvakin
 
Static Analysis Security Testing for Dummies... and You
Static Analysis Security Testing for Dummies... and YouStatic Analysis Security Testing for Dummies... and You
Static Analysis Security Testing for Dummies... and You
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
 
SAST vs. DAST: What’s the Best Method For Application Security Testing?
SAST vs. DAST: What’s the Best Method For Application Security Testing?SAST vs. DAST: What’s the Best Method For Application Security Testing?
SAST vs. DAST: What’s the Best Method For Application Security Testing?
 
DevSecOps What Why and How
DevSecOps What Why and HowDevSecOps What Why and How
DevSecOps What Why and How
 
Automating the mundanity of technique IDs with ATT&CK Detections Collector
Automating the mundanity of technique IDs with ATT&CK Detections CollectorAutomating the mundanity of technique IDs with ATT&CK Detections Collector
Automating the mundanity of technique IDs with ATT&CK Detections Collector
 
Secure Software Development Lifecycle - Devoxx MA 2018
Secure Software Development Lifecycle - Devoxx MA 2018Secure Software Development Lifecycle - Devoxx MA 2018
Secure Software Development Lifecycle - Devoxx MA 2018
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
 
Splunk workshop-Threat Hunting
Splunk workshop-Threat HuntingSplunk workshop-Threat Hunting
Splunk workshop-Threat Hunting
 
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
 
Modern SOC Trends 2020
Modern SOC Trends 2020Modern SOC Trends 2020
Modern SOC Trends 2020
 
cyber-security-reference-architecture
cyber-security-reference-architecturecyber-security-reference-architecture
cyber-security-reference-architecture
 
Building a Next-Generation Security Operation Center Based on IBM QRadar and ...
Building a Next-Generation Security Operation Center Based on IBM QRadar and ...Building a Next-Generation Security Operation Center Based on IBM QRadar and ...
Building a Next-Generation Security Operation Center Based on IBM QRadar and ...
 
DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019
 
Building a Cyber Security Operations Center for SCADA/ICS Environments
Building a Cyber Security Operations Center for SCADA/ICS EnvironmentsBuilding a Cyber Security Operations Center for SCADA/ICS Environments
Building a Cyber Security Operations Center for SCADA/ICS Environments
 
Security Operations Center (SOC) Essentials for the SME
Security Operations Center (SOC) Essentials for the SMESecurity Operations Center (SOC) Essentials for the SME
Security Operations Center (SOC) Essentials for the SME
 

Similar to SOC Lessons from DevOps and SRE by Anton Chuvakin

2011 09 18 United "Platitudes, reality and promise"
2011 09 18 United "Platitudes, reality and promise"2011 09 18 United "Platitudes, reality and promise"
2011 09 18 United "Platitudes, reality and promise"
Gene Kim
 
2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a
Gene Kim
 
Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
Network operations center best practices (3)
Network operations center best practices (3)Network operations center best practices (3)
Network operations center best practices (3)
Gabby Nizri
 

Similar to SOC Lessons from DevOps and SRE by Anton Chuvakin (20)

Future of SOC: More Security, Less Operations
Future of SOC: More Security, Less OperationsFuture of SOC: More Security, Less Operations
Future of SOC: More Security, Less Operations
 
Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan
Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan MuthayanAgile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan
Agile Network India | Agility Day @Noida | SRE & AIOps | Murugan Muthayan
 
2011 09 18 United "Platitudes, reality and promise"
2011 09 18 United "Platitudes, reality and promise"2011 09 18 United "Platitudes, reality and promise"
2011 09 18 United "Platitudes, reality and promise"
 
2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops final
 
What Are IT Environments, and Which Ones Do You Need?
What Are IT Environments, and Which Ones Do You Need?What Are IT Environments, and Which Ones Do You Need?
What Are IT Environments, and Which Ones Do You Need?
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdf
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
The 7 deadly sins of Overall Equipment Effectiveness (OEE)
The 7 deadly sins of Overall Equipment Effectiveness (OEE)The 7 deadly sins of Overall Equipment Effectiveness (OEE)
The 7 deadly sins of Overall Equipment Effectiveness (OEE)
 
Network operations center best practices (3)
Network operations center best practices (3)Network operations center best practices (3)
Network operations center best practices (3)
 
Сергей Баранов. Enterprise DevOps
Сергей Баранов. Enterprise DevOpsСергей Баранов. Enterprise DevOps
Сергей Баранов. Enterprise DevOps
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 
Building a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdfBuilding a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdf
 
Beyond the Scrum: Implementing Lean Software Practices in Your Organization
Beyond the Scrum: Implementing Lean Software Practices in Your OrganizationBeyond the Scrum: Implementing Lean Software Practices in Your Organization
Beyond the Scrum: Implementing Lean Software Practices in Your Organization
 
Business Process Modeling & Automation: Where are we?
Business Process Modeling & Automation: Where are we?Business Process Modeling & Automation: Where are we?
Business Process Modeling & Automation: Where are we?
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
 

More from Anton Chuvakin

More from Anton Chuvakin (20)

SOC Meets Cloud: What Breaks, What Changes, What to Do?
SOC Meets Cloud: What Breaks, What Changes, What to Do?SOC Meets Cloud: What Breaks, What Changes, What to Do?
SOC Meets Cloud: What Breaks, What Changes, What to Do?
 
Meet the Ghost of SecOps Future by Anton Chuvakin
Meet the Ghost of SecOps Future by Anton ChuvakinMeet the Ghost of SecOps Future by Anton Chuvakin
Meet the Ghost of SecOps Future by Anton Chuvakin
 
SANS Webinar: The Future of Log Centralization for SIEMs and DFIR – Is the En...
SANS Webinar: The Future of Log Centralization for SIEMs and DFIR – Is the En...SANS Webinar: The Future of Log Centralization for SIEMs and DFIR – Is the En...
SANS Webinar: The Future of Log Centralization for SIEMs and DFIR – Is the En...
 
20 Years of SIEM - SANS Webinar 2022
20 Years of SIEM - SANS Webinar 202220 Years of SIEM - SANS Webinar 2022
20 Years of SIEM - SANS Webinar 2022
 
SOCstock 2020 Groovy SOC Tunes aka Modern SOC Trends
SOCstock 2020  Groovy SOC Tunes aka Modern SOC TrendsSOCstock 2020  Groovy SOC Tunes aka Modern SOC Trends
SOCstock 2020 Groovy SOC Tunes aka Modern SOC Trends
 
Anton's 2020 SIEM Best and Worst Practices - in Brief
Anton's 2020 SIEM Best and Worst Practices - in BriefAnton's 2020 SIEM Best and Worst Practices - in Brief
Anton's 2020 SIEM Best and Worst Practices - in Brief
 
Generic siem how_2017
Generic siem how_2017Generic siem how_2017
Generic siem how_2017
 
Tips on SIEM Ops 2015
Tips on SIEM Ops 2015Tips on SIEM Ops 2015
Tips on SIEM Ops 2015
 
Five SIEM Futures (2012)
Five SIEM Futures (2012)Five SIEM Futures (2012)
Five SIEM Futures (2012)
 
RSA 2016 Security Analytics Presentation
RSA 2016 Security Analytics PresentationRSA 2016 Security Analytics Presentation
RSA 2016 Security Analytics Presentation
 
Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinFive Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
 
Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinFive Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
 
Practical Strategies to Compliance and Security with SIEM by Dr. Anton Chuvakin
Practical Strategies to Compliance and Security with SIEM by Dr. Anton ChuvakinPractical Strategies to Compliance and Security with SIEM by Dr. Anton Chuvakin
Practical Strategies to Compliance and Security with SIEM by Dr. Anton Chuvakin
 
SIEM Primer:
SIEM Primer:SIEM Primer:
SIEM Primer:
 
Log management and compliance: What's the real story? by Dr. Anton Chuvakin
Log management and compliance: What's the real story? by Dr. Anton ChuvakinLog management and compliance: What's the real story? by Dr. Anton Chuvakin
Log management and compliance: What's the real story? by Dr. Anton Chuvakin
 
On Content-Aware SIEM by Dr. Anton Chuvakin
On Content-Aware SIEM by Dr. Anton ChuvakinOn Content-Aware SIEM by Dr. Anton Chuvakin
On Content-Aware SIEM by Dr. Anton Chuvakin
 
Making Log Data Useful: SIEM and Log Management Together by Dr. Anton Chuvakin
Making Log Data Useful: SIEM and Log Management Together by Dr. Anton ChuvakinMaking Log Data Useful: SIEM and Log Management Together by Dr. Anton Chuvakin
Making Log Data Useful: SIEM and Log Management Together by Dr. Anton Chuvakin
 
PCI 2.0 What's Next for PCI DSS by Dr. Anton Chuvakin
PCI 2.0 What's Next for PCI DSS  by Dr. Anton ChuvakinPCI 2.0 What's Next for PCI DSS  by Dr. Anton Chuvakin
PCI 2.0 What's Next for PCI DSS by Dr. Anton Chuvakin
 
How to Gain Visibility and Control: Compliance Mandates, Security Threats and...
How to Gain Visibility and Control: Compliance Mandates, Security Threats and...How to Gain Visibility and Control: Compliance Mandates, Security Threats and...
How to Gain Visibility and Control: Compliance Mandates, Security Threats and...
 
Navigating the Data Stream without Boiling the Ocean:: Case Studies in Effec...
Navigating the Data Stream without Boiling the Ocean::  Case Studies in Effec...Navigating the Data Stream without Boiling the Ocean::  Case Studies in Effec...
Navigating the Data Stream without Boiling the Ocean:: Case Studies in Effec...
 

Recently uploaded

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

SOC Lessons from DevOps and SRE by Anton Chuvakin

  • 1. Dr. Anton Chuvakin Office of the CISO, Google Cloud
  • 3. A security operations center provides centralized and consolidated cybersecurity incident prevention, detection and response capabilities. Gartner First Things First: A SOC is Still … a SOC :-) SOC is first a team. Next a process. And it uses technology too.
  • 4. 2003 or 2023? Sec Ops is Ripe for Transformation We can’t store and analyze all data, resulting in blindspots It’s cost prohibitive to ingest all the data we need It takes too long to investigate alerts We struggle to build effective detection and have too many false positives/negatives Our processes are too manual, we are too slow to respond to and remediate threats We don’t have enough skilled engineers to make everything work
  • 6. BARD SAYS: What is the essence of the DevOps revolution in IT? DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high quality. DevOps is a culture, movement, or practice that emphasizes the collaboration and communication of both software developers and other information technology professionals while automating the process of software delivery and infrastructure changes. The DevOps revolution in IT is about breaking down the silos between development and operations teams and creating a more collaborative and efficient way of working. It’s about automating tasks, reducing manual errors, and improving communication and collaboration. DevOps can help organizations deliver software faster, more reliably, and with higher quality. It can also help organizations reduce costs and improve efficiency.
  • 7. Google vs Enterprise “SecOps” Problem What does Google do? What do most enterprises do? Efficiency Automation/SRE is a mindset – part of the hiring process, part of OKRs, and performance reviews Experimenting with SOAR, full adoption is tough due to minimal automation culture Employee Shortage Requires coding interviews, high pay, attracts the best, invests in growth Hires traditional roles, no coding, rarely outsources, less pay, less growth, more stress Employee Burnout 40/40/20 between eng, operations, and learning Utilization is almost always >100% Expensive Investment in efficiency solves for human costs Cost-prohibitive data ingestion, oftentimes paying SIEM + DIY, increasing $ from complexity Efficacy TI strongly embedded in D&R, mostly utilized towards proactive work, strong collaboration across Alphabet & benefits from developer hygiene CTI team produces great reports, SOC doing fire drills, >90% false positive rate, uneven distribution of skill (Tier 3)
  • 8. 5 SRE Lessons for SOC 03
  • 9. Let’s focus on 5 key areas today Eliminate Toil Use SLOs Evolve Automation Practice Release Engineering Strive for Simplicity
  • 10. Causes of Toil Less Gathering, More Analysis – basics to automate Key Activities To Implement 1. Too much technical debt 2. Priorities or goals are not aligned 3. Lack of training or support 4. Lack of collaboration 5. The business value to fix is too hard to realize ● Gathering machine information ● Gathering user information ● Process executions ● All context needed to help get to final (human) judgement Activity Train your team on toil & automation Create an Automation Queue Implement Blameless Postmortems Conduct Weekly Incident Reviews Implement SOAR Hire Automation Engineer(s) Implement CD/CR pipelines with metrics Eliminate Toil “...manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” 01
  • 11. ● Analyst utilization gets optimized ● More creative work, less toil ● Time back to do more proactive work ● Deeper operationalization of intel ● SecOps can scale with the business! Unit costs per event Evolve Automation 02 10X is an Underestimate!
  • 12. Use SLOs Tips, gotchas, and core metrics to consider 03 Core metrics Metric event volume event source counts pipeline latency triage time median triage time at 95% Incident resolution times Common metrics (false positives, # of incidents, etc) Key tips Gotchas 1. Optimize metrics for optimal value 2. Manage with indicators + objectives 3. Metrics matter (in context) 4. Defeating attackers beats SLOs 5. Choose metrics that actually matter 6. Make your SLO’s open (within your company) ● Fast =/= better – don’t incentivize speed, incentivize thoroughness ● More =/= better – solving 5000 cases manually is not better than automating of that; #NoHeroes
  • 13. Practice Release Engineering 04 Ad Hoc Visibility Significant Development effort up front to implement playbooks Review playbooks when a major problem occurs Response Orchestration Security Analytics Significant Development effort up front to implement use cases Add/Update detections in response to major new threat Onboard log sources as part of major tech transformation Review logs for new sources when a problem comes up Periodic Quarterly review of playbook performance and effectiveness Dev Sprint to update playbooks Quarterly review of detection efficacy Update/Deprecate ineffective detections Add/Update detections in response to major threat Onboard log sources annual or quarterly planned schedule Review data monthly for new log sources and to identify issues/outages Continuous Real-time alerts for detection efficacy drift Update/Deprecate ineffective detections at point of discovery Active Threat Monitoring to proactively identify new threats to build detections for Onboard new log sources as they are ready. Real-time identification of new log sources or log drops Automatic creation of alerts for handling Live Dashboards showing performance and accuracy metrics for playbooks Update/Deprecate ineffective playbooks at point of discovery Daily Review of SecOps work queues to identify automation opportunities
  • 14. "Complex systems require substantial human expertise in their operation and management. This expertise changes in character as technology changes but it also changes because of the need to replace experts who leave. In every case, training and refinement of skill and expertise is one part of the function of the system itself. At any moment, therefore, a given complex system will contain practitioners and trainees with varying degrees of expertise. Critical issues related to expertise arise from (1) the need to use scarce expertise as a resource for the most difficult or demanding production needs and (2) the need to develop expertise for future use." Human expertise in complex systems is constantly changing Strive for Simplicity 05 One consequence of not striving for simplicity https://how.complexsystems.fail/
  • 15. Actions Reduce toil in your SOC - shift toil to machines Use SLOs / metrics to drive change Evolve automation in SIEM, SOAR, threat intel, etc Practice release engineering for consistent improvement Strive for simplicity with processes, technology stack, etc
  • 16. Improvement The Power of Continuous Improvement Exponential growth happens faster when compounded more frequently Organizing your people and processes around continuous improvement means more agility and less resources Periodic improvement strategies leave capability gaps between sprints Time
  • 17. Resources “Achieving Autonomic Security Operations: Reducing toil” “Achieving Autonomic Security Operations: Why metrics matter (but not how you think)” “More SRE Lessons for SOC: Release Engineering Ideas” “Achieving Autonomic Security Operations: Automation as a Force Multiplier” “More SRE Lessons for SOC: Simplicity Helps Security”

Editor's Notes

  1. “SOCless SOC” and “Fusion center”
  2. And security operations is definitely ripe for transformation - many of the challenges that secops teams are faced with have been around for years.
  3. Project deep dive
  4. Add speaker notes or “Paste without formatting” (⌘+Shift+V on Mac) to retain this optimal font size for presenting in MP7 Maximum 5-6 bullets per slide If presenting someplace other than SVL-MP7-Valley Oak, reduce the speaker notes font size
  5. Shelly Advanced API security is an add on to Apigee which is focused on premium security services. It helps users design and build secure APIs. As being part of the API management platform, we are embedded in the entire API lifecycle and are able to provide visibility and controls to API security configurations Operate securely means how do you secure your APIs in runtime. We detect any abuse on your APIs logic or sensitive information and provide in-product dashboards or integration with SIEM for further analysis and alerting. Lastly, we bring to Apigee the experience we have in Google with security and Machine Learning in order to improve their security posture.
  6. Project deep dive
  7. We went through a significant modernization effort ourselves, especially in the years after the Aurora attack In 2015, we had minimal automation in place, and there was a high unit cost to managing D&R events Over the course of years, Alphabet’s estate grew exponentially, but we were able to achieve a 90% level of efficiency, thanks to our program grounded in SRE-based approaches Through the years, this radical focus on automation freed up time to allow our engineers to focus on higher order events More creative work, less toil based work More proactive work and threat hunting Better consumption, creation, and operationalization of threat intelligence across our workflows And most importantly, our engineers have significant influence with upstream development teams, where entire classes of threats can be mitigated before it hits the D&R workflow We’ve taken these learnings and paired them with our commercial capabilities to help our customers transform their SOC
  8. https://medium.com/anton-on-security/more-sre-lessons-for-soc-simplicity-helps-security-1f8a739ca422